This document summarizes a session on financial econometric models and R. Key points covered include downloading and using the open source R software, uploading data and calling variables, summarizing distributions with functions like mean and standard deviation, and visualizing data through histograms. Basic linear regression was demonstrated using the lm function. Coin tossing experiments were also conducted to explore binomial distributions.
10. summary(DATA) Shows a quick summary of the distribution of all variables
SPX SPXr AMEXr AMEX
Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287
1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580
Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007
ESGF 4IFM Q1 2012
Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891
3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923
Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967
summary(DATA$SPX) Shows a quick summary of the distribution of one variable
vinzjeannin@hotmail.com
Min. 1st Qu. Median Mean 3rd Qu. Max.
86.43 95.70 100.80 99.67 103.80 107.20
min(DATA)
Careful using the following instructions max(DATA)
> min(DATA)
[1] -0.08832874
This will consider DATA as one variable > max(DATA)
[1] 123.4793
> sd(DATA)
SPX SPXr AMEXr AMEX
4.92763551 0.01468776 6.03035318 0.01915489 10
Mean & SD > mean(DATA)
SPX SPXr AMEXr AMEX
9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04
11. Easy to show histogram
ESGF 4IFM Q1 2012
vinzjeannin@hotmail.com
hist(DATA$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq", 11
xlab="SPXr", col="blue")
12. Obvious Excess Kurtosis
ESGF 4IFM Q1 2012
Obvious Asymmetry
vinzjeannin@hotmail.com
Functions doesn’t exists directly in R…
However some VNP (Very Nice Programmer) built and shared add-in
Package Moments 12
13. Menu: Packages / Install Package(s)
ESGF 4IFM Q1 2012
vinzjeannin@hotmail.com
• Choose whatever mirror (server) you want
• Usually France (Toulouse) is very good as it’s a
University Server with all the packages available
13
14. ESGF 4IFM Q1 2012
Once installed, you can load them with the
following instructions:
require(moments)
library(moments)
vinzjeannin@hotmail.com
New functions can now be used!
14
15. > require(moments)
> library(moments)
> skewness(DATA)
SPX SPXr AMEXr AMEX
-0.6358029 -0.4178701 0.1876994 -0.2453693
ESGF 4IFM Q1 2012
> kurtosis(DATA)
SPX SPXr AMEXr AMEX
2.411177 5.671254 2.078366 5.770583
vinzjeannin@hotmail.com
Btw, you can store any result in a variable
> Kur<-kurtosis(DATA$SPXr)
> Kur
[1] 5.671254
15
16. Lost?
Call the help! help(kurtosis)
ESGF 4IFM Q1 2012
Reminds you the package
vinzjeannin@hotmail.com
Syntax
Arguments definition
16
17. Let’s store a few values
SPMean<-mean(DATA$SPXr)
SPSD<-sd(DATA$SPXr) Package Stats
Build a sequence, the x axis
ESGF 4IFM Q1 2012
x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500)
Build a normal density on these x
vinzjeannin@hotmail.com
Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats
Display the histogram
hist(DATA$SPXr, breaks=25,main="S&P Returns / Normal Package graphics
Distribution",xlab="Returns",ylab="Occurences", col="blue")
Display on top of it the normal density
lines(x,y1,type="l",lwd=3,col="red") Package graphics 17
19. Let’s build a spread Spd<-DATA$SPXr-DATA$AMEX
What is the mean?
ESGF 4IFM Q1 2012
Mean is linear ������ ������������ + ������������ = ������������ ������ + ������������(������)
������ ������ − ������ = ������ ������ − ������(������)
vinzjeannin@hotmail.com
Let’s verify
> mean(DATA$SPXr)-mean(DATA$AMEX)-mean(Spd)
[1] 0
19
20. What is the standard deviation?
Is standard deviation linear?
NO!
ESGF 4IFM Q1 2012
VAR ������������ + ������������ = ������2 ������������������ ������ + ������2 ������ ������ + 2������������������������������(������, ������)
> (var(DATA$SPXr)+var(DATA$AMEX)-2*cov(DATA$SPXr,DATA$AMEX))^0.5
vinzjeannin@hotmail.com
[1] 0.01019212
> sd(Spd)
[1] 0.01019212
Let’s show the implication in a proper manner
Let’s create a portfolio containing half of each stocks 20
25. LM stands for Linear Models
> lm(DATA$AMEX~DATA$SPXr)
ESGF 4IFM Q1 2012
Call:
lm(formula = DATA$AMEX ~ DATA$SPXr)
Coefficients:
(Intercept) DATA$SPXr
0.0004505 1.1096287
vinzjeannin@hotmail.com
������ = 1.1096������ + 0.04%
Will be used later for linear regression and hedging
25
26. Do you remember what is the most platykurtic distribution in the nature?
Toss Head = Success = 1 / Tail = Failure = 0
ESGF 4IFM Q1 2012
100 toss… Else memory issue…
> require(moments)
Loading required package: moments
> library(moments)
vinzjeannin@hotmail.com
> toss<-rbinom(100,1,0.5)
> mean(toss)
[1] 0.52
> kurtosis(toss)
[1] 1.006410
> kurtosis(toss)-3
[1] -1.993590
> hist(toss, breaks=10,main="Tossing a
coin 100 times",xlab="Result of the
trial",ylab="Occurence")
> sum(toss)
[1] 52
26
Let’s test the fairness
27. Density of a binomial distribution
������ + 1 ! ℎ
������ ������ ������ = ℎ, ������ = ������ = ������ (1 − ������)������
ℎ! ������!
ESGF 4IFM Q1 2012
Let’s plot this density with
â„Ž = 52
������ = 48
vinzjeannin@hotmail.com
������ = 100
N<-100
h<-52
t<-48
r<-seq(0,1,length=500)
y<-
(factorial(N+1)/(factorial(h)*factori
al(t)))*r^h*(1-r)^t
plot(r,y,type="l",col="red",main="Pro
bability density to have 52 head out
100 flips")
27
28. If the probability between 45% and 55% is significant we’ll accept the fairness
ESGF 4IFM Q1 2012
vinzjeannin@hotmail.com
28
What do you think?
29. What is the problem with this coin?
Obvious fake! Assuming the probability of head is 0.7
Toss it! Head = Success = 1 / Tail = Failure = 0
ESGF 4IFM Q1 2012
100 toss
> require(moments)
Loading required package: moments
> library(moments)
vinzjeannin@hotmail.com
> toss<-rbinom(100,1,0.7)
> mean(toss)
[1] 0.72
> kurtosis(toss)
[1] 1.960317
> kurtosis(toss)-3
[1] -1.039683
> hist(toss, breaks=10,main="Tossing a
coin 100 times",xlab="Result of the
trial",ylab="Occurence")
> sum(toss)
[1] 72
29
Let’s test the fairness (assuming you don’t know it’s a trick)
30. If the probability between 45% and 55% is significant we’ll accept the fairness
N<-100
h<-72
t<-28
r<-seq(0.2,0.8,length=500)
y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t
ESGF 4IFM Q1 2012
plot(r,y,type="l",col="red",main="Probability density or r given 72
head out 100 flips")
vinzjeannin@hotmail.com
Trick coin!
30
39. Too
Many
Outliers!
ESGF 5IFM Q1 2012
There should be 2 max
To be normal
vinzjeannin@hotmail.com
Fatter tails than the
normal distribution
Excess kurtosis obvious
39
Fatter and longer tails
40. ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
Resid<-resid(Reg)
ks.test(Resid, "pnorm")
Fatter tails
One-sample Kolmogorov-Smirnov test
data: Resid 40
D = 0.4889, p-value < 2.2e-16 Reject H0 (Normality)
alternative hypothesis: two-sided
41. OLS & Autocorrelation
New idea… No intercept
ESGF 5IFM Q1 2012
Only one parameters to estimate:
• Slope β
Minimising residuals
vinzjeannin@hotmail.com
������ ������
������ = ������������ 2 = ������������ − ������������������ 2
������=1 ������=1
When E is minimal?
41
When partial derivatives i.r.w. a is 0
45. ks.test(resid(lm(Val$AMEX~Val$SPX-1)), "pnorm")
One-sample Kolmogorov-Smirnov test
ESGF 5IFM Q1 2012
data: resid(lm(Val$AMEX ~ Val$SPX - 1))
D = 0.4887, p-value < 2.2e-16
alternative hypothesis: two-sided
vinzjeannin@hotmail.com
H0 rejected
Not much better
It’s the way statistics are… You look for, but sometimes you don’t find!
45
However you can now regress without intercept and that’s great!
46. The purpose was to see if the market as effect an effect on a particular stock
The dependence is obvious but residuals too volatile for any stable application
ESGF 5IFM Q1 2012
But attention!
We are looking for causation, not correlation!
Causation implies correlation
vinzjeannin@hotmail.com
Reciprocity is not true!
DON’T BE FOOLED BY PRETTY NUMBERS
46
Let prove this…
47. ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
Perfect linear dependence
Excellent R-Squared
47
Residuals are a white noise
What’s the problem then?
48. ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
Do you really think fresh lemon reduces car fatalities?
48
50. How to use OLS to make predictions?
ESGF 5IFM Q1 2012
Not possible with 2 random variables
Need to find a variable with any future value known
That would leave the randomness to only 1 variable
vinzjeannin@hotmail.com
The time is easily predictable, isn't it?
50
55. One-sample Kolmogorov-Smirnov test
data: eps
D = 0.1333, p-value = 0.001578
alternative hypothesis: two-sided
ESGF 5IFM Q1 2012
Normality rejected
vinzjeannin@hotmail.com
Regression rejected
What would be the next step?
Mistake in methodology?
55
What else could we regress?
57. Lag Plots!
Maybe the series is auto-correlated to itself…
Seems the case with 1 to 6 lags
ESGF 5IFM Q1 2012
par(mfrow=c(2,1))
acf(Spread$Spark,20)
pacf(Spread$Spark,20)
vinzjeannin@hotmail.com
This will show correlogram (ACF)
������ ������ = ������������������������(������������ − ������������−������ ) Correlation between pairs of values of {Yt},
separated by an interval of length k.
This will show the partial auto-correlation (PACF)
Correlation between the current value and the value k
periods ago, after controlling for observations at
intermediate lags (identifying intermediate lag effects) 57
58. ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
ACF is decreasing slowly
Propagation of autocorrelation due to step 1
58
Main character of non stationary time series
Heteroscedasticity
59. Need some differentiation
The series has three components
• Trend
ESGF 5IFM Q1 2012
• Seasonality
• Residual
First order differentiation may be useful
vinzjeannin@hotmail.com
plot(diff(Spread$Spark), type="l")
Stationary?
Seasonality? 59
61. Differentiation bring new horizons…
ESGF 5IFM Q1 2012
vinzjeannin@hotmail.com
The whole point is to show you the methodology
Step by steps…
Sometimes unsuccessfully (bad results) 61
62. Let’s go back to the stage one of the OLS
Differentiation can happen before the OLS
NonLin<-
ESGF 5IFM Q1 2012
read.csv(file="C:/Users/Vinz/Desktop/ExExp.csv",head=TRUE,sep=",")
plot(NonLin$X,NonLin$VarEp)
vinzjeannin@hotmail.com
62
What do you suggest?
63. Let’s create a new variable
������������������������������ = ln(������)
ESGF 5IFM Q1 2012
Lin<-log(NonLin$VarEp)
plot(NonLin$X,Lin)
vinzjeannin@hotmail.com
Magic!
63
64. > lm(Lin~NonLin$X)
Call:
lm(formula = Lin ~ NonLin$X)
layout(matrix(1:4,2,2))
Coefficients: plot(lm(Lin~NonLin$X))
ESGF 5IFM Q1 2012
(Intercept) NonLin$X
-4.605 1.000
vinzjeannin@hotmail.com
Do not hesitate to
transform
64