2. ESGF 4IFM Q1 2012
Summary of the session (est. 4.5h)
• R Steps by Steps
• Reminders of last session
• The Value at Risk
vinzjeannin@hotmail.com
• OLS & Exploration
2
3. R Step by Step
Downloadable for free (open source)
ESGF 4IFM Q1 2012
http://www.r-project.org/
vinzjeannin@hotmail.com
3
4. Main screen
vinzjeannin@hotmail.com ESGF 4IFM Q1 2012
4
10. summary(DATA) Shows a quick summary of the distribution of all variables
SPX SPXr AMEXr AMEX
Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287
1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580
Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007
ESGF 4IFM Q1 2012
Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891
3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923
Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967
summary(DATA$SPX) Shows a quick summary of the distribution of one variable
vinzjeannin@hotmail.com
Min. 1st Qu. Median Mean 3rd Qu. Max.
86.43 95.70 100.80 99.67 103.80 107.20
min(DATA)
Careful using the following instructions max(DATA)
> min(DATA)
[1] -0.08832874
This will consider DATA as one variable > max(DATA)
[1] 123.4793
> sd(DATA)
SPX SPXr AMEXr AMEX
4.92763551 0.01468776 6.03035318 0.01915489 10
Mean & SD > mean(DATA)
SPX SPXr AMEXr AMEX
9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04
11. Easy to show histogram
ESGF 4IFM Q1 2012
vinzjeannin@hotmail.com
hist(DATA$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq", 11
xlab="SPXr", col="blue")
12. Obvious Excess Kurtosis
ESGF 4IFM Q1 2012
Obvious Asymmetry
vinzjeannin@hotmail.com
Functions doesn’t exists directly in R…
However some VNP (Very Nice Programmer) built and shared add-in
Package Moments 12
13. Menu: Packages / Install Package(s)
ESGF 4IFM Q1 2012
vinzjeannin@hotmail.com
• Choose whatever mirror (server) you want
• Usually France (Toulouse) is very good as it’s a
University Server with all the packages available
13
14. ESGF 4IFM Q1 2012
Once installed, you can load them with the
following instructions:
require(moments)
library(moments)
vinzjeannin@hotmail.com
New functions can now be used!
14
15. > require(moments)
> library(moments)
> skewness(DATA)
SPX SPXr AMEXr AMEX
-0.6358029 -0.4178701 0.1876994 -0.2453693
ESGF 4IFM Q1 2012
> kurtosis(DATA)
SPX SPXr AMEXr AMEX
2.411177 5.671254 2.078366 5.770583
vinzjeannin@hotmail.com
Btw, you can store any result in a variable
> Kur<-kurtosis(DATA$SPXr)
> Kur
[1] 5.671254
15
16. Lost?
Call the help! help(kurtosis)
ESGF 4IFM Q1 2012
Reminds you the package
vinzjeannin@hotmail.com
Syntax
Arguments definition
16
17. Let’s store a few values
SPMean<-mean(DATA$SPXr)
SPSD<-sd(DATA$SPXr) Package Stats
Build a sequence, the x axis
ESGF 4IFM Q1 2012
x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500)
Build a normal density on these x
vinzjeannin@hotmail.com
Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats
Display the histogram
hist(DATA$SPXr, breaks=25,main="S&P Returns / Normal Package graphics
Distribution",xlab="Returns",ylab="Occurences", col="blue")
Display on top of it the normal density
lines(x,y1,type="l",lwd=3,col="red") Package graphics 17
19. Let’s build a spread Spd<-DATA$SPXr-DATA$AMEX
What is the mean?
ESGF 4IFM Q1 2012
Mean is linear ������ ������������ + ������������ = ������������ ������ + ������������(������)
������ ������ − ������ = ������ ������ − ������(������)
vinzjeannin@hotmail.com
Let’s verify
> mean(DATA$SPXr)-mean(DATA$AMEX)-mean(Spd)
[1] 0
19
20. What is the standard deviation?
Is standard deviation linear?
NO!
ESGF 4IFM Q1 2012
VAR ������������ + ������������ = ������2 ������������������ ������ + ������2 ������ ������ + 2������������������������������(������, ������)
> (var(DATA$SPXr)+var(DATA$AMEX)-2*cov(DATA$SPXr,DATA$AMEX))^0.5
vinzjeannin@hotmail.com
[1] 0.01019212
> sd(Spd)
[1] 0.01019212
Let’s show the implication in a proper manner
Let’s create a portfolio containing half of each stocks 20
25. LM stands for Linear Models
> lm(DATA$AMEX~DATA$SPXr)
ESGF 4IFM Q1 2012
Call:
lm(formula = DATA$AMEX ~ DATA$SPXr)
Coefficients:
(Intercept) DATA$SPXr
0.0004505 1.1096287
vinzjeannin@hotmail.com
������ = 1.1096������ + 0.04%
Will be used later for linear regression and hedging
25
26. Do you remember what is the most platykurtic distribution in the nature?
Toss Head = Success = 1 / Tail = Failure = 0
ESGF 4IFM Q1 2012
100 toss… Else memory issue…
> require(moments)
Loading required package: moments
> library(moments)
vinzjeannin@hotmail.com
> toss<-rbinom(100,1,0.5)
> mean(toss)
[1] 0.52
> kurtosis(toss)
[1] 1.006410
> kurtosis(toss)-3
[1] -1.993590
> hist(toss, breaks=10,main="Tossing a
coin 100 times",xlab="Result of the
trial",ylab="Occurence")
> sum(toss)
[1] 52
26
Let’s test the fairness
27. Density of a binomial distribution
������ + 1 ! ℎ
������ ������ ������ = ℎ, ������ = ������ = ������ (1 − ������)������
ℎ! ������!
ESGF 4IFM Q1 2012
Let’s plot this density with
ℎ = 52
������ = 48
vinzjeannin@hotmail.com
������ = 100
N<-100
h<-52
t<-48
r<-seq(0,1,length=500)
y<-
(factorial(N+1)/(factorial(h)*factori
al(t)))*r^h*(1-r)^t
plot(r,y,type="l",col="red",main="Pro
bability density to have 52 head out
100 flips")
27
28. If the probability between 45% and 55% is significant we’ll accept the fairness
ESGF 4IFM Q1 2012
vinzjeannin@hotmail.com
28
What do you think?
29. What is the problem with this coin?
Obvious fake! Assuming the probability of head is 0.7
Toss it! Head = Success = 1 / Tail = Failure = 0
ESGF 4IFM Q1 2012
100 toss
> require(moments)
Loading required package: moments
> library(moments)
vinzjeannin@hotmail.com
> toss<-rbinom(100,1,0.7)
> mean(toss)
[1] 0.72
> kurtosis(toss)
[1] 1.960317
> kurtosis(toss)-3
[1] -1.039683
> hist(toss, breaks=10,main="Tossing a
coin 100 times",xlab="Result of the
trial",ylab="Occurence")
> sum(toss)
[1] 72
29
Let’s test the fairness (assuming you don’t know it’s a trick)
30. If the probability between 45% and 55% is significant we’ll accept the fairness
N<-100
h<-72
t<-28
r<-seq(0.2,0.8,length=500)
y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t
ESGF 4IFM Q1 2012
plot(r,y,type="l",col="red",main="Probability density or r given 72
head out 100 flips")
vinzjeannin@hotmail.com
Trick coin!
30
31. Reminders of last session
ESGF 4IFM Q1 2012
Normal Standard Distribution
Snapshot, 4 moments:
vinzjeannin@hotmail.com
Mean 0
SD 1
Skewness 0
Kurtosis 3
31
38. The Value at Risk
Estimate with a specific confidence interval (usually 95% or 99%) the worth loss
possible. In other words, the point is to identify a particular point on the left of
ESGF 4IFM Q1 2012
the distribution
vinzjeannin@hotmail.com
3 Methods
• Historical
• Parametrical
• Monte-Carlo
38
For now, we’ll focus on VaR on one linear asset… FCOJ is back!
39. Historical VaR
• No assumption about the distribution
• Easy to implement and calculate
ESGF 4IFM Q1 2012
• Sensitive to the length of the history
• Sensitive to very extreme values
Let’s get back to our FCOJ time series, last price is $150 cents
vinzjeannin@hotmail.com
If we work on returns, we’ve seen the use of the PERCENTILE
Excel function
• 1% Percentile is -5.22%, 99% Historical Daily VaR is -$7.83 cents
• 5% Percentile is -3.34%, 95% Historical Daily VaR is -$5.00 cents
39
Works as well on weekly, monthly, quarterly series
40. Historical VaR
ESGF 4IFM Q1 2012
Can be worked as well with prices variations
instead of returns but it’s going to be price
sensitive! So careful to the bias.
vinzjeannin@hotmail.com
• 1% Percentile in term of price movement is -$8.11 cents
• 5% Percentile in term of price movement is -$4.14 cents
40
41. Parametric VaR
• Easy to implement and calculate
• Assumes a particular shape of the distribution
ESGF 4IFM Q1 2012
• Not really sensitive to fat tails
FCOJ Mean Return: 0.1364%
vinzjeannin@hotmail.com
FCOJ SD: 2.1664%
We already know:
������ ������ ≤ −1.645 ∗ ������ + ������ = 0.05
������ ������ ≤ −2.326 ∗ ������ + ������ = 0.01
Then:
������ ������ ≤ −3.43% = 0.05 VaR 95% (-$5.15 cents) 41
������ ������ ≤ −4.90% = 0.01 VaR 99% (-$7.35 cents)
42. Parametric VaR
Very often you assume anyway a 0 mean, therefore:
ESGF 4IFM Q1 2012
������ ������ ≤ −3.57% = 0.05 VaR 95% (-$5.36 cents)
������ ������ ≤ −5.04% = 0.01 VaR 99% (-$8.10 cents)
vinzjeannin@hotmail.com
Lower values than the historical VaR
Problem with leptokurtic distributions, impact of fat tails isn’t
strong on the method
42
43. Monte Carlo VaR
ESGF 4IFM Q1 2012
• Most efficient method when asset aren’t linear
• Tough to implement
• Assumes a particular shape of the distribution
vinzjeannin@hotmail.com
Based on an assumption of a price process (for example GBM)
Great number of random simulations on the price process to build a
distribution and outline the VaR
43
44. Monte Carlo VaR
Let’s simulate 10,000 GBM, 252 steps and store the final result
ESGF 4IFM Q1 2012
library(sde)
require(sde)
FCOJ<-
read.csv(file="C:/Users/Vinz/Desktop/FCOJStats.csv",head=FALSE,sep=",")
Drift<-mean(FCOJ$V1)
vinzjeannin@hotmail.com
Volat<-sd(FCOJ$V1)
nbsim<-252
Spot<-150
Final<-rep(1,10000)
for(i in 1:100000){
Matr<-GBM(x=Spot,r=Drift, sigma=Volat,N=nbsim)
Final[i]<-Matr[nbsim+1]}
quantile(Final, 0.05)
quantile(Final, 0.01)
Don’t be fooled by the 252, we’re still making a daily simulation: what 44
to change in the code to make it yearly?
45. Monte Carlo VaR
> quantile(Final, 0.05)
5%
ESGF 4IFM Q1 2012
144.93
> quantile(Final, 0.01)
1%
142.7941
vinzjeannin@hotmail.com
• 95% Daily VaR is -$5.07 cents
• 99% Daily VaR is -$7.21 cents
Let’s take off the drift
45
46. Monte Carlo VaR
> quantile(Final, 0.05)
5%
ESGF 4IFM Q1 2012
144.7583
> quantile(Final, 0.01)
1%
142.6412
vinzjeannin@hotmail.com
• 95% Daily VaR is -$5.35 cents
• 99% Daily VaR is -$7.36 cents
46
47. Which is the best?
Comparison
vinzjeannin@hotmail.com ESGF 4IFM Q1 2012
47
48. Going forward on the VaR
ESGF 4IFM Q1 2012
All method give different but coherent values
Easy? Yes but…
• We’ve involved one asset only
vinzjeannin@hotmail.com
• We’ve involved a linear asset
What about an option?
What about 2 assets?
48
49. Going forward on the VaR
Portfolio scale: what to look at to calculate the VaR?
ESGF 4IFM Q1 2012
Big question, is the VaR additive?
vinzjeannin@hotmail.com
NO!
Keywords for the future: covariance, correlation, diversification
49
50. Going forward on the VaR
Options: what to look at to calculate the VaR?
ESGF 4IFM Q1 2012
4 risk factors:
• Underlying price
• Interest rate
• Volatility
vinzjeannin@hotmail.com
• Time
4 answers:
• Delta/Gamma approximation knowing the distribution of the underlying
• Rho approximation knowing the distribution of the underlying rate
• Vega approximation knowing the distribution of implied volatility
• Theta (time decay)
Yes but,… Does the underling price/rate/volatility vary independently? 50
Might be a bit more complicated than expected…
51. OLS & Exploration
OLS: Ordinary Least Square
ESGF 5IFM Q1 2012
Linear regression model
Minimize the sum of the square vertical distances
between the observations and the linear
approximation
vinzjeannin@hotmail.com
������ = ������ ������ = ������������ + ������
Residual ε
51
52. Two parameters to estimate:
• Intercept α
• Slope β
ESGF 5IFM Q1 2012
Minimising residuals
������ ������
������ = ������������ 2 = ������������ − ������������������ + ������ 2
vinzjeannin@hotmail.com
������=1 ������=1
When E is minimal?
When partial derivatives i.r.w. a and b are 0
52
54. ������������
Leads easily to the intercept
������������
������ ������
������ ∗ ������������ + ������������ = ������������
������=1 ������=1
ESGF 5IFM Q1 2012
������������������ + ������������ = ������������
������������ + ������ = ������
vinzjeannin@hotmail.com
������ = ������ − ������������
The regression line is going through (������ , ������)
The distance of this point to the line is 0 indeed
54
57. ������ Covariance
������=1(������������ − ������ )(������������ − ������)
������ = ������ 2
������=1(������������ − ������ ) Variance
ESGF 5IFM Q1 2012
������������������������������
������ =
������2������
vinzjeannin@hotmail.com
������ = ������ − ������������
You can use Excel function INTERCEPT and SLOPE
57
58. Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}
ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
58
You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
59. Let’s asses the quality of the regression
Let’s calculate the correlation coefficient (aka Pearson Product-Moment
Correlation Coefficient – PPMCC):
ESGF 5IFM Q1 2012
������������������������������
������ = Value between -1 and 1
������������ ������������
������ = 1
vinzjeannin@hotmail.com
Perfect dependence
������ ~0 No dependence
Give an idea of the dispersion of the scatterplot
59
You can use Excel function CORREL
61. What is good quality?
ESGF 5IFM Q1 2012
Slightly discretionary…
vinzjeannin@hotmail.com
If
3
������ ≥ = 0.8666 …
2
It’s largely admitted as the threshold for acceptable / poor
61
62. The regression itself introduces a bias
Let’s introduce the coefficient of determination R-Squared
ESGF 5IFM Q1 2012
Total Dispersion = Dispersion Regression + Dispersion Residual
vinzjeannin@hotmail.com
2 2 2
������������ − ������ = ������������ − ������������ + ������������ − ������
Dispersion Regression
������2 =
Total Dispersion
In other words the part of the total dispersion explained by the regression 62
You can use Excel function RSQ
63. In a simple linear regression with intercept ������2 = ������ 2
ESGF 5IFM Q1 2012
Is a good correlation coefficient and a good coefficient of
determination enough to accept the regression?
vinzjeannin@hotmail.com
Not necessarily!
Residuals need to have no effect, in other word to be a white noise!
63
65. Don’t get fooled by numbers!
ESGF 5IFM Q1 2012
For every dataset of the Quarter
������ = 9
������ = 7.5
vinzjeannin@hotmail.com
������ = 3 + 0.5������
������ = 0.82
������2 = 0.67
Can you say at this stage which regression is the best?
65
Certainly not those on the right you need a LINEAR dependence
66. ESGF 5IFM Q1 2012
Is any linear regression useless?
vinzjeannin@hotmail.com
Think what you could do to the series
Polynomial transformation, log transformation,…
66
Else, non linear regressions, but it’s another story
67. First application on financial market
S&P / AmEx in 2011
ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
67
68. ������������������������������������������,������&������
������ = = 0.8501
������������������������������ ������������&������
������2 = ������ 2 = 0.7227
ESGF 5IFM Q1 2012
Oups :-o
Is Excel wrong?
vinzjeannin@hotmail.com
R-Squared has different calculation methods
Let’s accept the following regression then as the quality seems pretty good
������������������������������ = 0.06% + 1.1046 ∗ ������������&������
68
69. How to use this?
ESGF 5IFM Q1 2012
• Forecasting? Not really…
Both are random variables
vinzjeannin@hotmail.com
• Hedging? Yes but basis risk
Yes but careful to the residuals…
In theory, what is the daily result of the hedge? ������
Let’s have a try!
69
70. Hedging $1.0M of AmEx Stocks with $1.1046M of S&P
ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
It would have been too easy… Great differences… Why?
Sensitivity to the size of the sample
70
Heteroscedasticity Basis Risk
71. The purpose was to see if the market as effect an effect on a particular stock
The dependence is obvious but residuals too volatile for any stable application
ESGF 5IFM Q1 2012
But attention!
We are looking for causation, not correlation!
Causation implies correlation
vinzjeannin@hotmail.com
Reciprocity is not true!
DON’T BE FOOLED BY PRETTY NUMBERS
71
Let prove this…
72. ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
Perfect linear dependence
Excellent R-Squared
72
Residuals are a white noise
What’s the problem then?
73. ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
Do you really think fresh lemon reduces car fatalities?
73