# Applied Statistics II

ESGF 4IFM Q1 2012 Applied StatisticsVincent JEANNIN – ESGF 4IFM Q1 2012
ESGF 4IFM Q1 2012Summary of the session (est. 4.5h)• R Steps by Steps• Reminders of last session• The Value at Risk• OLS & Exploration
Main screen
Menu: File / New Script
• 7. Run your instruction(s) vinzjeannin@hotmail.com ESGF 4IFM Q1 20127
Run your instruction(s)
You can call variables anytime you want
• 10. summary(DATA) Shows a quick summary of the distribution of all variables SPX SPXr AMEXr AMEX Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287 1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580 Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007 ESGF 4IFM Q1 2012 Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891 3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923 Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967summary(DATA\$SPX) Shows a quick summary of the distribution of one variable vinzjeannin@hotmail.com Min. 1st Qu. Median Mean 3rd Qu. Max. 86.43 95.70 100.80 99.67 103.80 107.20 min(DATA) Careful using the following instructions max(DATA) > min(DATA) [1] -0.08832874 This will consider DATA as one variable > max(DATA) [1] 123.4793 > sd(DATA) SPX SPXr AMEXr AMEX 4.92763551 0.01468776 6.03035318 0.01915489 10 Mean & SD > mean(DATA) SPX SPXr AMEXr AMEX 9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04
Easy to show histogram
hist(DATA\$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq", xlab="SPXr", col="blue")
Obvious Excess Kurtosis
Obvious Asymmetry
Functions doesn't exists directly in R…However some VNP (Very Nice Programmer) built and shared add-in Package Moments
Menu: Packages / Install Package(s)
• Choose whatever mirror (server) you want• Usually France (Toulouse) is very good as it's a University Server with all the packages available
Once installed, you can load them with thefollowing instructions: require(moments) library(moments)
New functions can now be used!
> require(moments)> library(moments)> skewness(DATA) SPX SPXr AMEXr AMEX-0.6358029 -0.4178701 0.1876994 -0.2453693
> kurtosis(DATA) SPX SPXr AMEXr AMEX2.411177 5.671254 2.078366 5.770583
Btw, you can store any result in a variable > Kur<-kurtosis(DATA\$SPXr) > Kur [1] 5.671254
Lost?Call the help! help(kurtosis)
Reminds you the package
Syntax Arguments definition
Let's store a few values SPMean<-mean(DATA\$SPXr) SPSD<-sd(DATA\$SPXr) Package Stats Build a sequence, the x axis
x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500) Build a normal density on these x
Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats Display the histogramhist(DATA\$SPXr, breaks=25,main="S&P Returns / Normal Package graphicsDistribution",xlab="Returns",ylab="Occurences", col="blue") Display on top of it the normal density lines(x,y1,type="l",lwd=3,col="red") Package graphics
• 18. ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 18Positive Excess Kurtosis & Negative Skew
Positive Excess Kurtosis & Negative Skew
Let's build a spread Spd<-DATA\$SPXr-DATA\$AMEXWhat is the mean?
Mean is linear + = + () − = − ()
Let's verify mean(DATA\$SPXr)-mean(DATA\$AMEX)-mean(Spd)[1] 0
What is the standard deviation? Is standard deviation linear? NO!
VAR + = 2 + 2 + 2(, ) (var(DATA\$SPXr)+var(DATA\$AMEX)-2*cov(DATA\$SPXr,DATA\$AMEX))^0.5
[1] 0.01019212 sd(Spd)[1] 0.01019212 Let's show the implication in a proper manner Let's create a portfolio containing half of each stocks
• 22. The efficient frontier vinzjeannin@hotmail.com ESGF 4IFM Q1 201222
The efficient frontier
• 24. plot(DATA\$AMEX,DATA\$SPXr)abline(lm(DATA\$AMEX~DATA\$SPXr), col=blue) ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 24
plot(DATA\$AMEX,DATA\$SPXr)abline(lm(DATA\$AMEX~DATA\$SPXr), col=blue)
LM stands for Linear Models lm(DATA\$AMEX~DATA\$SPXr)
Call:lm(formula = DATA\$AMEX ~ DATA\$SPXr)Coefficients:(Intercept) DATA\$SPXr 0.0004505 1.1096287
= 1.1096 + 0.04%Will be used later for linear regression and hedging
Do you remember what is the most platykurtic distribution in the nature? Toss Head = Success = 1 / Tail = Failure = 0
100 toss… Else memory issue… require(moments)Loading required package: moments library(moments)
toss-rbinom(100,1,0.5) mean(toss)[1] 0.52 kurtosis(toss)[1] 1.006410 kurtosis(toss)-3[1] -1.993590 hist(toss, breaks=10,main=Tossing acoin 100 times,xlab=Result of thetrial,ylab=Occurence) sum(toss)[1] 52
Let's test the fairness
• 28. If the probability between 45% and 55% is significant we’ll accept the fairness ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 28 What do you think?
If the probability between 45% and 55% is significant we'll accept the fairness
What do you think?
• 30. If the probability between 45% and 55% is significant we’ll accept the fairnessN-100h-72t-28r-seq(0.2,0.8,length=500)y-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t ESGF 4IFM Q1 2012plot(r,y,type=l,col=red,main=Probability density or r given 72head out 100 flips) vinzjeannin@hotmail.com Trick coin! 30
• 31. Reminders of last session ESGF 4IFM Q1 2012Normal Standard Distribution Snapshot, 4 moments: vinzjeannin@hotmail.com Mean 0 SD 1 Skewness 0 Kurtosis 3 31
Reminders of last session
Normal Standard Distribution Snapshot, 4 moments:
Mean 0 SD 1 Skewness 0 Kurtosis 3
• 33. Notation (, ) 1 (−)2 −Density = 22 2 2 ESGF 4IFM Q1 2012 CDF ≤ = = −∞ vinzjeannin@hotmail.com 33
Notation (, ) 1 (−)2 −Density = 22 2 2
CDF ≤ = = −∞
Let be X~N(1,1.5) Find: ≤ 4.75
4.75−1 ≤ 4.75 =P ≤ 1.5 With Y~N(0,1) P ≤ 2.5 =?
Use the table! P ≤ −2.5 =0.0062 P ≤ 2.5 =0.9938
P ≤ 4.75 =0.9938
QQ Plot qqnorm(FCOJ\$V1)
qqline(FCOJ\$V1)
Fat Tail
Geometric Brownian Motion Based on Stochastic Differential Equation = + BS
Discrete form = + with ~N(0,1) CRR S = + 1 − = + 1 − − = = − 1 BV= OpUp ∗ p + OpDown ∗ 1 − p ∗ −
= = −
Greeks Approximation – Taylor Development 1 = + ∆ ∗ + ∗ ∗ 2 2
1 + ∗ ∗ 3 6
1 + ∗ 4ℎ ∗ 4 24 etc…
The Value at RiskEstimate with a specific confidence interval (usually 95% or 99%) the worth losspossible. In other words, the point is to identify a particular point on the left of the distribution
3 Methods • Historical • Parametrical • Monte-Carlo
For now, we'll focus on VaR on one linear asset… FCOJ is back!
• 40. Historical VaR ESGF 4IFM Q1 2012 Can be worked as well with prices variations instead of returns but it’s going to be price sensitive! So careful to the bias. vinzjeannin@hotmail.com• 1% Percentile in term of price movement is -\$8.11 cents• 5% Percentile in term of price movement is -\$4.14 cents 40
Historical VaR
Can be worked as well with prices variations instead of returns but it's going to be price sensitive! So careful to the bias.
• 1% Percentile in term of price movement is -\$8.11 cents• 5% Percentile in term of price movement is -\$4.14 cents
Parametric VaR• Easy to implement and calculate• Assumes a particular shape of the distribution
• Not really sensitive to fat tails FCOJ Mean Return: 0.1364%
FCOJ SD: 2.1664% We already know: ≤ −1.645 ∗ + = 0.05 ≤ −2.326 ∗ + = 0.01 Then: ≤ −3.43% = 0.05 VaR 95% (-\$5.15 cents)
≤ −4.90% = 0.01 VaR 99% (-\$7.35 cents)
Parametric VaRVery often you assume anyway a 0 mean, therefore:
≤ −3.57% = 0.05 VaR 95% (-\$5.36 cents) ≤ −5.04% = 0.01 VaR 99% (-\$8.10 cents)
Lower values than the historical VaR Problem with leptokurtic distributions, impact of fat tails isn't strong on the method
Monte Carlo VaR
• Most efficient method when asset aren't linear• Tough to implement• Assumes a particular shape of the distribution
Based on an assumption of a price process (for example GBM) Great number of random simulations on the price process to build a distribution and outline the VaR
Monte Carlo VaR Let's simulate 10,000
• 46. Monte Carlo VaR quantile(Final, 0.05) 5% ESGF 4IFM Q1 2012 144.7583 quantile(Final, 0.01) 1% 142.6412 vinzjeannin@hotmail.com• 95% Daily VaR is -\$5.35 cents• 99% Daily VaR is -\$7.36 cents 46
• 47. Which is the best? Comparison vinzjeannin@hotmail.com ESGF 4IFM Q1 201247
• 48. Going forward on the VaR ESGF 4IFM Q1 2012All method give different but coherent valuesEasy? Yes but… • We’ve involved one asset only vinzjeannin@hotmail.com • We’ve involved a linear assetWhat about an option?What about 2 assets? 48
• 49. Going forward on the VaRPortfolio scale: what to look at to calculate the VaR? ESGF 4IFM Q1 2012 Big question, is the VaR additive? vinzjeannin@hotmail.com NO! Keywords for the future: covariance, correlation, diversification 49
• 50. Going forward on the VaROptions: what to look at to calculate the VaR? ESGF 4IFM Q1 2012 4 risk factors: • Underlying price • Interest rate • Volatility vinzjeannin@hotmail.com • Time 4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)Yes but,… Does the underling price/rate/volatility vary independently? 50 Might be a bit more complicated than expected…
• 51. OLS Exploration OLS: Ordinary Least Square ESGF 5IFM Q1 2012 Linear regression model Minimize the sum of the square vertical distances between the observations and the linear approximation vinzjeannin@hotmail.com = = + Residual ε 51
• 52. Two parameters to estimate: • Intercept α • Slope β ESGF 5IFM Q1 2012Minimising residuals = 2 = − + 2 vinzjeannin@hotmail.com =1 =1 When E is minimal? When partial derivatives i.r.w. a and b are 0 52
• 53. = 2 = − + 2 = − − 2 =1 =1 =1 Quick high school reminder if necessary… ESGF 5IFM Q1 2012 − − 2 = 2 − 2 − 2 + 2 2 + 2 + 2 vinzjeannin@hotmail.com = −2 + 2 2 + 2 = 0 = −2 + 2 + 2 = 0 =1 =1 − + 2 + = 0 − + + = 0=1 =1 ∗ 2 + ∗ = ∗ + = =1 =1 =1 =1 =1 53
• 54. Leads easily to the intercept ∗ + = =1 =1 ESGF 5IFM Q1 2012 + = + = vinzjeannin@hotmail.com = − The regression line is going through ( , ) The distance of this point to the line is 0 indeed 54
• 55. = − y = + − y − = ( − ) ESGF 5IFM Q1 2012 = −2 + 2 2 + 2 = 0 = −2 + 2 + 2 = 0 =1 =1 vinzjeannin@hotmail.com − − = 0 − − = 0 =1 =1 − − + = 0 =1 − + − = 0 =1 ( − − − ) = 0 ( − ) − ( − ) = 0 =1 =1 55 ( − − − ) = 0 =1
• 56. We have ( − − − ) = 0 and ( − − − ) = 0=1 =1 ESGF 5IFM Q1 2012 ( − − − ) = ( − − − ) =1 =1 vinzjeannin@hotmail.com ( − − − ) − − − − =0 =1 =1 ( − )( − − − ) = 0 =1 Finally… =1( − )( − ) 56 = 2 =1( − )
• 57. Covariance =1( − )( − ) = 2 =1( − ) Variance ESGF 5IFM Q1 2012 = 2 vinzjeannin@hotmail.com = − You can use Excel function INTERCEPT and SLOPE 57
• 58. Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2} ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 58 You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
• 59. Let’s asses the quality of the regressionLet’s calculate the correlation coefficient (aka Pearson Product-MomentCorrelation Coefficient – PPMCC): ESGF 5IFM Q1 2012 = Value between -1 and 1 = 1 vinzjeannin@hotmail.com Perfect dependence ~0 No dependence Give an idea of the dispersion of the scatterplot 59 You can use Excel function CORREL
• 60. Poor quality R=0.62 R=0.96 High quality vinzjeannin@hotmail.com ESGF 5IFM Q1 201260
• 61. What is good quality? ESGF 5IFM Q1 2012 Slightly discretionary… vinzjeannin@hotmail.comIf 3 ≥ = 0.8666 … 2 It’s largely admitted as the threshold for acceptable / poor 61
• 62. The regression itself introduces a bias Let’s introduce the coefficient of determination R-Squared ESGF 5IFM Q1 2012Total Dispersion = Dispersion Regression + Dispersion Residual vinzjeannin@hotmail.com 2 2 2 − = − + − Dispersion Regression 2 = Total Dispersion In other words the part of the total dispersion explained by the regression 62 You can use Excel function RSQ
• 63. In a simple linear regression with intercept 2 = 2 ESGF 5IFM Q1 2012Is a good correlation coefficient and a good coefficient ofdetermination enough to accept the regression? vinzjeannin@hotmail.com Not necessarily! Residuals need to have no effect, in other word to be a white noise! 63
• 64. vinzjeannin@hotmail.com ESGF 5IFM Q1 201264
• 65. Don’t get fooled by numbers! ESGF 5IFM Q1 2012 For every dataset of the Quarter = 9 = 7.5 vinzjeannin@hotmail.com = 3 + 0.5 = 0.82 2 = 0.67 Can you say at this stage which regression is the best? 65Certainly not those on the right you need a LINEAR dependence
• 66. ESGF 5IFM Q1 2012Is any linear regression useless? vinzjeannin@hotmail.com Think what you could do to the series Polynomial transformation, log transformation,… 66 Else, non linear regressions, but it’s another story
• 67. First application on financial market SP / AmEx in 2011 ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 67
• 68. , = = 0.8501 2 = 2 = 0.7227 ESGF 5IFM Q1 2012 Oups :-o Is Excel wrong? vinzjeannin@hotmail.com R-Squared has different calculation methodsLet’s accept the following regression then as the quality seems pretty good = 0.06% + 1.1046 ∗ 68
• 69. How to use this? ESGF 5IFM Q1 2012 • Forecasting? Not really… Both are random variables vinzjeannin@hotmail.com • Hedging? Yes but basis risk Yes but careful to the residuals… In theory, what is the daily result of the hedge? Let’s have a try! 69
• 70. Hedging \$1.0M of AmEx Stocks with \$1.1046M of SP ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com It would have been too easy… Great differences… Why? Sensitivity to the size of the sample 70 Heteroscedasticity Basis Risk
• 71. The purpose was to see if the market as effect an effect on a particular stock The dependence is obvious but residuals too volatile for any stable application ESGF 5IFM Q1 2012But attention! We are looking for causation, not correlation! Causation implies correlation vinzjeannin@hotmail.com Reciprocity is not true! DON’T BE FOOLED BY PRETTY NUMBERS 71 Let prove this…
• 72. ESGF 5IFM Q1 2012 vinzjeannin@hotmail.comPerfect linear dependenceExcellent R-Squared 72Residuals are a white noise What’s the problem then?
• 73. ESGF 5IFM Q1 2012 vinzjeannin@hotmail.comDo you really think fresh lemon reduces car fatalities? 73
• 74. vinzjeannin@hotmail.com ESGF 5IFM Q1 201274
• 75. Conclusion R VaR OLS Normal Distribution vinzjeannin@hotmail.com ESGF 4IFM Q1 201275