ESGF 5IFM Q1 2012
Financial Econometric Models
  Vincent JEANNIN – ESGF 5IFM
            Q1 2012




                                vinzjeannin@hotmail.com
                                      1
ESGF 5IFM Q1 2012
Summary of the session (est 3h)

• Introduction & Objectives
• Bibliography
• OLS & Exploration




                                  vinzjeannin@hotmail.com
                                        2
Introduction & Objectives
      • What is a model?  =  +    with  being a white noise




                                                                                  ESGF 5IFM Q1 2012
      • What the point writing models?


                    Describe data behaviour




                                                                                  vinzjeannin@hotmail.com
                    Modelise data behaviour
                    Forecast data behaviour



• Acquire theory knowledge on Econometrics & Statistics
• Step by step from OLS to ANOVA on residuals
• Usage of R and Excel                                                                  3
Bibliography




    vinzjeannin@hotmail.com   ESGF 5IFM Q1 2012
4
OLS & Exploration
         OLS: Ordinary Least Square




                                                                           ESGF 5IFM Q1 2012
         Linear regression model
         Minimize the sum of the square vertical distances
         between the observations and the linear
         approximation




                                                                           vinzjeannin@hotmail.com
                                                   =   =  + 

                                                    Residual ε




                                                                                 5
Two parameters to estimate:
   • Intercept α
   • Slope β




                                                                     ESGF 5IFM Q1 2012
Minimising residuals


                           

    =           2 =           −  +    2




                                                                     vinzjeannin@hotmail.com
          =1              =1




          When E is minimal?



                     When partial derivatives i.r.w. a and b are 0
                                                                           6
=             2 =              −  +                2   =           −  −       2

         =1              =1                                               =1


                    Quick high school reminder if necessary…




                                                                                                                                        ESGF 5IFM Q1 2012
        −  −         2    =  2 − 2  − 2 +  2  2 + 2 + 2



                                                                                                   
                                                                                  




                                                                                                                                        vinzjeannin@hotmail.com
     =               −2  + 2 2 + 2 = 0                                 =               −2 + 2 + 2 = 0
                                                                                  
              =1                                                                                  =1

                                                                                      

       −  +  2 +  = 0                                                     − +  +  = 0
=1                                                                                   =1
                                                                                                              

 ∗            2 +  ∗               =                               ∗            +  =          
       =1                      =1              =1                                       =1                   =1
                                                                                                                                              7
Leads easily to the intercept

                                            

                ∗           +  =          
                      =1                   =1




                                                            ESGF 5IFM Q1 2012
                +  = 


               +  = 




                                                            vinzjeannin@hotmail.com
               =  − 


       The regression line is going through ( , )


       The distance of this point to the line is 0 indeed

                                                                  8
=  −                y =  +  − 

                                      y −  = ( −  )




                                                                                                            ESGF 5IFM Q1 2012
                                                                       
                                                          
     =          −2  + 2 2 + 2 = 0              =           −2 + 2 + 2 = 0
                                                          
         =1                                                           =1

                                                              




                                                                                                            vinzjeannin@hotmail.com
          −  −  = 0                                    −  −  = 0
 =1                                                          =1
  
                                                                
          −  −  +  = 0
 =1
                                                                       −  +  −  = 0
                                                               =1
                                                               

          ( −  −   −  ) = 0                         ( − ) − ( −  ) = 0
  =1                                                         =1
                                                                                                                9
                                                                       (  −  −   −  ) = 0
                                                               =1
We have
                                                            

        ( −  −   −  ) = 0       and                (  −  −   −  ) = 0
=1                                                       =1




                                                                                                            ESGF 5IFM Q1 2012
                                                             

                          ( −  −   −  ) =           (  −  −   −  )
                  =1                                        =1


                                                             




                                                                                                            vinzjeannin@hotmail.com
                          ( −  −   −  ) −            −  −   −        =0
                  =1                                        =1

                   

                         ( − )( −  −   −  ) = 0
                  =1


                                                 Finally…

                                            
                                            =1( − )( −    )                                   10
                                    =                        2
                                                =1( − )
Covariance
       =1( −  )( −   )
 =                       2
            =1( −  )                    Variance




                                                             ESGF 5IFM Q1 2012
                                              
                                        =
                                                2




                                                             vinzjeannin@hotmail.com
                                        =  − 




       You can use Excel function INTERCEPT and SLOPE


                                                             11
Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2}




                                                                              ESGF 5IFM Q1 2012
                                                                              vinzjeannin@hotmail.com
                                                                              12

      You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
Let’s asses the quality of the regression

Let’s calculate the correlation coefficient (aka Pearson Product-Moment
Correlation Coefficient – PPMCC):




                                                                          ESGF 5IFM Q1 2012
                    
              =                       Value between -1 and 1
                      


                = 1




                                                                          vinzjeannin@hotmail.com
                                        Perfect dependence


                ~0                    No dependence




    Give an idea of the dispersion of the scatterplot
                                                                          13

    You can use Excel function CORREL
Poor quality
                    R=0.62
                                                                            R=0.96

                                                             High quality




                         vinzjeannin@hotmail.com   ESGF 5IFM Q1 2012
14
What is good quality?




                                                                           ESGF 5IFM Q1 2012
      Slightly discretionary…




                                                                           vinzjeannin@hotmail.com
If
             3
      ≥      = 0.8666 …
            2
            It’s largely admitted as the threshold for acceptable / poor




                                                                           15
The regression itself introduces a bias


                  Let’s introduce the coefficient of determination R-Squared




                                                                                 ESGF 5IFM Q1 2012
Total Dispersion = Dispersion Regression + Dispersion Residual




                                                                                 vinzjeannin@hotmail.com
                               2                     2                   2
                    −        =    −        +    − 




                           Dispersion Regression
                2 =
                              Total Dispersion

   In other words the part of the total dispersion explained by the regression   16


     You can use Excel function RSQ
In a simple linear regression with intercept 2 =  2




                                                                         ESGF 5IFM Q1 2012
Is a good correlation coefficient and a good coefficient of
determination enough to accept the regression?




                                                                         vinzjeannin@hotmail.com
  Not necessarily!




  Residuals need to have no effect, in other word to be a white noise!

                                                                         17
vinzjeannin@hotmail.com   ESGF 5IFM Q1 2012
18
Don’t get fooled by numbers!




                                                                   ESGF 5IFM Q1 2012
    For every dataset of the Quarter

                          = 9
                          = 7.5




                                                                   vinzjeannin@hotmail.com
                          = 3 + 0.5
                          = 0.82
                         2 = 0.67




         Can you say at this stage which regression is the best?
                                                                   19

Certainly not those on the right you need a LINEAR dependence
ESGF 5IFM Q1 2012
Is any linear regression useless?




                                                                              vinzjeannin@hotmail.com
               Think what you could do to the series



               Polynomial transformation, log transformation,…


                                                                              20
                       Else, non linear regressions, but it’s another story
First application on financial market


     S&P / AmEx in 2011




                                        ESGF 5IFM Q1 2012
                                        vinzjeannin@hotmail.com
                                        21
,&
                          =                      = 0.8501
                                &


                              2 =  2 = 0.7227




                                                                              ESGF 5IFM Q1 2012
    Oups :-o
    Is Excel wrong?




                                                                              vinzjeannin@hotmail.com
               R-Squared has different calculation methods




Let’s accept the following regression then as the quality seems pretty good

                       = 0.06% + 1.1046 ∗ &

                                                                              22
How to use this?




                                                                          ESGF 5IFM Q1 2012
     • Forecasting?              Not really…
                                 Both are random variables




                                                                          vinzjeannin@hotmail.com
     • Hedging?                  Yes but basis risk
                                 Yes but careful to the residuals…


               In theory, what is the daily result of the hedge?     


Let’s have a try!

                                                                          23
Hedging $1.0M of AmEx Stocks with $1.1046M of S&P




                                                        ESGF 5IFM Q1 2012
                                                        vinzjeannin@hotmail.com
 It would have been too easy… Great differences… Why?


            Sensitivity to the size of the sample
                                                        24
            Heteroscedasticity
Let’s have a similar approach using a proper statistics and econometrics software




                                                                                     ESGF 5IFM Q1 2012
                           • Free
                           • Open Source
                           • Developments shared by developers




                                                                                     vinzjeannin@hotmail.com
          Let’s begin with statistical exploration to get familiar with the series
          and the software
 > Val<-read.csv(file="C:/Users/Vinz/Desktop/Val.csv",head=TRUE,sep=",")
 > summary(Val)


                  SPX                       AMEX
             Min.   :-0.0666344        Min.   :-0.0883287
             1st Qu.:-0.0069082        1st Qu.:-0.0094580
             Median : 0.0010016        Median : 0.0013007                            25
             Mean   : 0.0001249        Mean   : 0.0005891
             3rd Qu.: 0.0075235        3rd Qu.: 0.0102923
             Max.   : 0.0474068        Max.   : 0.0710967
> hist(Val$AMEX, breaks=20, main="Distribution
                               AMEX Returns")
                               > sd(Val$AMEX)
                               [1] 0.01915489




                                                                                ESGF 5IFM Q1 2012
                                                                                vinzjeannin@hotmail.com
> hist(Val$SPX, breaks=20, main="Distribution
SPXX Returns")
> sd(Val$SPX)
[1] 0.01468776                                                                  26
These are obvious negatively skewed distributions




                                                                                       ESGF 5IFM Q1 2012
                                    Reminders
                                                              3
                                                     −              −  3
                                   =                  =
                                                                    −  2 3/2




                                                                                       vinzjeannin@hotmail.com
• Negative skew: long left tail, mass on the right, skew to the left
• Positive skew: long right tail, mass on the left, skew to the right

                         > skewness(Val$AMEX)
                         [1] -0.2453693
                         > skewness(Val$SPX)                                           27
                         [1] -0.4178701
These are obvious leptokurtic distributions




                                                                                       ESGF 5IFM Q1 2012
                                   Reminders


                                                                4
                                                       −             −  4
                                     =                  =
                                                                      −  2 2




                                                                                       vinzjeannin@hotmail.com
> library(moments)
> kurtosis(Val$AMEX)              What is their K?
[1] 5.770583                      (excess kurtosis)
> kurtosis(Val$SPX)
[1] 5.671254                                                                           28
                                  Subtract 3 to make it relative to the
                                  normal distribution…
Quick check: what are the Skewness and Kurtosis of {1,2,-3,0,-2,1,1}?




                                                                        ESGF 4IFM Q1 2012
                                                                        vinzjeannin@hotmail.com
  Excel function SKEW
  R function skewness (package moments)
                                                                        29
ESGF 4IFM Q1 2012
                                        vinzjeannin@hotmail.com
Excel function KURT
R function kurtosis (package moments)
                                        30
By the way, what is the most platykurtic distribution in the nature?




                                                    Toss it!




                                                                                      ESGF 4IFM Q1 2012
                       Head = Success = 1 / Tail = Failure = 0




                                                                                      vinzjeannin@hotmail.com
> require(moments)
> library(moments)
> toss<-rbinom(10000000,1,0.5)
> mean(toss)
[1] 0.5001777
> kurtosis(toss)
[1] 1.000001
> kurtosis(toss)-3
[1] -1.999999
> hist(toss, breaks=10,main="Tossing a
coin 10 millions times",xlab="Result
of the trial",ylab="Occurence")                                                       31
> sum(toss)
[1] 5001777
50.01777% rate of success: fair or not fair? Trick coin ?

        Can be tested later with a Bayesian approach




                                                                                        ESGF 4IFM Q1 2012
On a perfect 50/50, Kurtosis would be 1, Excess Kurtosis -2: the minimum!
This is a Bernoulli trial

 (, ) with      > 1 and        0 <  < 1              ∈ ℝ   and  integer




                                                                                        vinzjeannin@hotmail.com
                            Mean            

                            SD                   (1 − )

                            Skewness          1 − 2
                                             (1 − )

                            Kurtosis             1
                                                        −3
                                             (1 − )
                                                                                        32
     Easy to demonstrate if p=0.5 the Kurtosis will be the lowest
     Bit more complicated to demonstrate it for any distribution
Back to our series, a good tool is the BoxPlot




                                                                          ESGF 5IFM Q1 2012
Too
Many
Outliers!




                                                                          vinzjeannin@hotmail.com
There should be 2 max
To be normal


Fatter tails than the
normal distribution

                                                                          33
                  boxplot(Val$AMEX,Val$SPX, main="AMEX & S&P BoxPlots",
                            names=c("AMEX","SPX"),col="blue")
Leptokurtic distributions


Negatively skewed distribution




                                                               ESGF 5IFM Q1 2012
        Are they normal distributions?




                                                               vinzjeannin@hotmail.com
        Let’s compare them to normal distributions with same
        standard deviation and mean and make the QQ Plots



                                                               34
x=seq(-0.2,0.2,length=200)
                                     y1=dnorm(x,mean=mean(Val$AMEX),sd=sd(
                                     Val$AMEX))
                                     hist(Val$AMEX, breaks=100,main="AmEx
                                     Returns / Normal




                                                                             ESGF 5IFM Q1 2012
                                     Distribution",xlab="Return",ylab="Occ
                                     urence")
                                     lines(x,y1,type="l",lwd=3,col="red")




                                                                             vinzjeannin@hotmail.com
x=seq(-0.2,0.2,length=200)
y1=dnorm(x,mean=mean(Val$SPX),sd=sd(Val$S
PX))
hist(Val$SPX, breaks=20,main="S&P Returns
/ Normal
Distribution",xlab="Return",ylab="Occuren
ce")
lines(x,y1,type="l",lwd=3,col="red")                                         35
ESGF 5IFM Q1 2012
                                                Excess kurtosis obvious




                                                                          vinzjeannin@hotmail.com
Fatter and longer tails



                                                                          36
Let’s have a look to their CDF through QQPlot
> qqnorm(Val$AMEX)                            > qqnorm(Val$SPX)
> qqline(Val$AMEX)                            > qqline(Val$SPX)




                                                                  ESGF 5IFM Q1 2012
                                                                  vinzjeannin@hotmail.com
                                     Fatter tails                 37
 Let’s properly test the normality
Can use many tests…

•   Kolmogorov-Smirnov
•   Jarque Bera
•   Chi Square
•




                                                              ESGF 5IFM Q1 2012
    Shapiro Wilk

Let’s try Kolmogorov-Smirnov

             It compares the distance between the empirical




                                                              vinzjeannin@hotmail.com
             CDF and the CFD of the reference distribution




                                                              38
ESGF 5IFM Q1 2012
x=seq(-4,4,length=1000)
plot(ecdf(Val$AMEX),do.points=FALSE, col="red", lwd=3,
main="Normal Distribution against AMEX - CFD's", xlab="x",
ylab="P(X<=x)")
lines(x,pnorm(x,mean=mean(Val$AMEX),sd=sd(Val$AMEX)),col="blue",t
ype="l",lwd=3)




                                                                    vinzjeannin@hotmail.com
x=seq(-4,4,length=1000)
plot(ecdf(Val$SPX),do.points=FALSE, col="red", lwd=3,
main="Normal Distribution against S&P - CFD's", xlab="x",
ylab="P(X<=x)")
lines(x,pnorm(x,mean=mean(Val$SPX),sd=sd(Val$SPX)),col="blue",typ
e="l",lwd=3)




                                                                    39
> ks.test(Val$SPX, "pnorm")                      > ks.test(Val$AMEX, "pnorm")

        One-sample Kolmogorov-                              One-sample Kolmogorov-Smirnov
Smirnov test                                     test

data: Val$SPX                                    data: Val$AMEX
D = 0.4811, p-value < 2.2e-16                    D = 0.4742, p-value < 2.2e-16
alternative hypothesis: two-sided                alternative hypothesis: two-sided




                                                                                            ESGF 5IFM Q1 2012
             The 0 hypothesis is the distribution is normal




                                                                                            vinzjeannin@hotmail.com
                  Do we accept or reject the hypothesis 0 with a 95%
                  confidence interval?



                    The hypothesis regarding the distributional
                    form is rejected if the test statistic, D, is greater
                    than the critical value obtained from a table

                                                                                            40
vinzjeannin@hotmail.com
                                                        1.36
        Sample size: 251                                       = 0.086
                                                        251

                     Rejected or not?                                                  41

                                                                  P-Value was giving
Rejected! Series aren’t fitting a normal distribution
                                                                  the answer
Ok, we now know a bit more the 2 series we want to regress
                     > lm(Val$AMEX~Val$SPX)

                     Call:
                     lm(formula = Val$AMEX ~ Val$SPX)




                                                                           ESGF 5IFM Q1 2012
                     Coefficients:
                     (Intercept)        Val$SPX
                       0.0004505      1.1096287

plot(Val$SPX,Val$AMEX, main="S&P / AmEx", xlab="S&P", ylab="AmEx",
col="red")




                                                                           vinzjeannin@hotmail.com
abline(lm(Val$AMEX~Val$SPX), col="blue")




                                               = 110.96% ∗  + 0.045%



                                                                           42
The next important step is no analyse the residuals


  > Reg<-lm(Val$AMEX~Val$SPX)
  > summary(Reg)




                                                                                ESGF 5IFM Q1 2012
  Call:
  lm(formula = Val$AMEX ~ Val$SPX)

  Residuals:
        Min        1Q    Median             3Q        Max
  -0.030387 -0.006072 -0.000114       0.006624   0.027824




                                                                                vinzjeannin@hotmail.com
  Coefficients:
               Estimate Std. Error t value Pr(>|t|)
  (Intercept) 0.0004505 0.0006365    0.708     0.48
  Val$SPX     1.1096287 0.0434231 25.554     <2e-16 ***
  ---
  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
  1

  Residual standard error: 0.01008 on 249 degrees of freedom
  Multiple R-squared: 0.7239,     Adjusted R-squared: 0.7228
  F-statistic:   653 on 1 and 249 DF, p-value: < 2.2e-16

                                                                                43
They need to be a white noise, you can have a first assessment with quartiles
plot(Reg)
                                                   layout(matrix(1:4,2,2))




     vinzjeannin@hotmail.com   ESGF 5IFM Q1 2012
44
QQ Plot compares the CDF




                                                            ESGF 5IFM Q1 2012
A perfect fit is a line




                                                            vinzjeannin@hotmail.com
                           Left tail noticeably different

                                                            45
ESGF 5IFM Q1 2012
                                                                            vinzjeannin@hotmail.com
Residuals should be randomly distributed around the 0 horizontal line

You don’t want to see a trend, a dependence


To accept or reject the regression you need residuals to be a white noise

                                                                            46
            Their mean should be 0
ESGF 5IFM Q1 2012
Nothing suggesting a white noise




                                                                                vinzjeannin@hotmail.com
             • Square root of the standardized residuals as a function of the
               fitted values
             • There should be no obvious trend in this plot


                                                                                47
Showing now leverage

                        Marginal importance of a point in the regression




                                                                           ESGF 5IFM Q1 2012
                                                                           vinzjeannin@hotmail.com
Far points suggest outlier or poor model




                                                                           48
So do we accept the regression?


                 Probably not… But let’s check…
                 Kolmogorov-Smirnov on residuals




                                                                                ESGF 5IFM Q1 2012
                         1.36                    Higher bound value for the
                   =          = 0.086
                         251                     H0 to be accepted




                                                                                vinzjeannin@hotmail.com
            Resid<-resid(Reg)
            ks.test(Resid, "pnorm")


              One-sample Kolmogorov-Smirnov test

             data: Resid
             D = 0.4889, p-value < 2.2e-16
             alternative hypothesis: two-sided


Rejected!            Regression between 2 different asset are very often poor
                                                                                49
                                     Heteroscedasticity

                                     Basis risk if you hedge anyway
Conclusion




                             ESGF 5IFM Q1 2012
        OLS


        Residuals




                             vinzjeannin@hotmail.com
        Normality


        Heteroscedasticity




                             50

Financial Econometric Models I

  • 1.
    ESGF 5IFM Q12012 Financial Econometric Models Vincent JEANNIN – ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 1
  • 2.
    ESGF 5IFM Q12012 Summary of the session (est 3h) • Introduction & Objectives • Bibliography • OLS & Exploration vinzjeannin@hotmail.com 2
  • 3.
    Introduction & Objectives • What is a model? = + with being a white noise ESGF 5IFM Q1 2012 • What the point writing models? Describe data behaviour vinzjeannin@hotmail.com Modelise data behaviour Forecast data behaviour • Acquire theory knowledge on Econometrics & Statistics • Step by step from OLS to ANOVA on residuals • Usage of R and Excel 3
  • 4.
    Bibliography vinzjeannin@hotmail.com ESGF 5IFM Q1 2012 4
  • 5.
    OLS & Exploration OLS: Ordinary Least Square ESGF 5IFM Q1 2012 Linear regression model Minimize the sum of the square vertical distances between the observations and the linear approximation vinzjeannin@hotmail.com = = + Residual ε 5
  • 6.
    Two parameters toestimate: • Intercept α • Slope β ESGF 5IFM Q1 2012 Minimising residuals = 2 = − + 2 vinzjeannin@hotmail.com =1 =1 When E is minimal? When partial derivatives i.r.w. a and b are 0 6
  • 7.
    = 2 = − + 2 = − − 2 =1 =1 =1 Quick high school reminder if necessary… ESGF 5IFM Q1 2012 − − 2 = 2 − 2 − 2 + 2 2 + 2 + 2 vinzjeannin@hotmail.com = −2 + 2 2 + 2 = 0 = −2 + 2 + 2 = 0 =1 =1 − + 2 + = 0 − + + = 0 =1 =1 ∗ 2 + ∗ = ∗ + = =1 =1 =1 =1 =1 7
  • 8.
    Leads easily tothe intercept ∗ + = =1 =1 ESGF 5IFM Q1 2012 + = + = vinzjeannin@hotmail.com = − The regression line is going through ( , ) The distance of this point to the line is 0 indeed 8
  • 9.
    = − y = + − y − = ( − ) ESGF 5IFM Q1 2012 = −2 + 2 2 + 2 = 0 = −2 + 2 + 2 = 0 =1 =1 vinzjeannin@hotmail.com − − = 0 − − = 0 =1 =1 − − + = 0 =1 − + − = 0 =1 ( − − − ) = 0 ( − ) − ( − ) = 0 =1 =1 9 ( − − − ) = 0 =1
  • 10.
    We have ( − − − ) = 0 and ( − − − ) = 0 =1 =1 ESGF 5IFM Q1 2012 ( − − − ) = ( − − − ) =1 =1 vinzjeannin@hotmail.com ( − − − ) − − − − =0 =1 =1 ( − )( − − − ) = 0 =1 Finally… =1( − )( − ) 10 = 2 =1( − )
  • 11.
    Covariance =1( − )( − ) = 2 =1( − ) Variance ESGF 5IFM Q1 2012 = 2 vinzjeannin@hotmail.com = − You can use Excel function INTERCEPT and SLOPE 11
  • 12.
    Calculate the Variancesand Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2} ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 12 You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
  • 13.
    Let’s asses thequality of the regression Let’s calculate the correlation coefficient (aka Pearson Product-Moment Correlation Coefficient – PPMCC): ESGF 5IFM Q1 2012 = Value between -1 and 1 = 1 vinzjeannin@hotmail.com Perfect dependence ~0 No dependence Give an idea of the dispersion of the scatterplot 13 You can use Excel function CORREL
  • 14.
    Poor quality R=0.62 R=0.96 High quality vinzjeannin@hotmail.com ESGF 5IFM Q1 2012 14
  • 15.
    What is goodquality? ESGF 5IFM Q1 2012 Slightly discretionary… vinzjeannin@hotmail.com If 3 ≥ = 0.8666 … 2 It’s largely admitted as the threshold for acceptable / poor 15
  • 16.
    The regression itselfintroduces a bias Let’s introduce the coefficient of determination R-Squared ESGF 5IFM Q1 2012 Total Dispersion = Dispersion Regression + Dispersion Residual vinzjeannin@hotmail.com 2 2 2 − = − + − Dispersion Regression 2 = Total Dispersion In other words the part of the total dispersion explained by the regression 16 You can use Excel function RSQ
  • 17.
    In a simplelinear regression with intercept 2 = 2 ESGF 5IFM Q1 2012 Is a good correlation coefficient and a good coefficient of determination enough to accept the regression? vinzjeannin@hotmail.com Not necessarily! Residuals need to have no effect, in other word to be a white noise! 17
  • 18.
    vinzjeannin@hotmail.com ESGF 5IFM Q1 2012 18
  • 19.
    Don’t get fooledby numbers! ESGF 5IFM Q1 2012 For every dataset of the Quarter = 9 = 7.5 vinzjeannin@hotmail.com = 3 + 0.5 = 0.82 2 = 0.67 Can you say at this stage which regression is the best? 19 Certainly not those on the right you need a LINEAR dependence
  • 20.
    ESGF 5IFM Q12012 Is any linear regression useless? vinzjeannin@hotmail.com Think what you could do to the series Polynomial transformation, log transformation,… 20 Else, non linear regressions, but it’s another story
  • 21.
    First application onfinancial market S&P / AmEx in 2011 ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 21
  • 22.
    ,& = = 0.8501 & 2 = 2 = 0.7227 ESGF 5IFM Q1 2012 Oups :-o Is Excel wrong? vinzjeannin@hotmail.com R-Squared has different calculation methods Let’s accept the following regression then as the quality seems pretty good = 0.06% + 1.1046 ∗ & 22
  • 23.
    How to usethis? ESGF 5IFM Q1 2012 • Forecasting? Not really… Both are random variables vinzjeannin@hotmail.com • Hedging? Yes but basis risk Yes but careful to the residuals… In theory, what is the daily result of the hedge? Let’s have a try! 23
  • 24.
    Hedging $1.0M ofAmEx Stocks with $1.1046M of S&P ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com It would have been too easy… Great differences… Why? Sensitivity to the size of the sample 24 Heteroscedasticity
  • 25.
    Let’s have asimilar approach using a proper statistics and econometrics software ESGF 5IFM Q1 2012 • Free • Open Source • Developments shared by developers vinzjeannin@hotmail.com Let’s begin with statistical exploration to get familiar with the series and the software > Val<-read.csv(file="C:/Users/Vinz/Desktop/Val.csv",head=TRUE,sep=",") > summary(Val) SPX AMEX Min. :-0.0666344 Min. :-0.0883287 1st Qu.:-0.0069082 1st Qu.:-0.0094580 Median : 0.0010016 Median : 0.0013007 25 Mean : 0.0001249 Mean : 0.0005891 3rd Qu.: 0.0075235 3rd Qu.: 0.0102923 Max. : 0.0474068 Max. : 0.0710967
  • 26.
    > hist(Val$AMEX, breaks=20,main="Distribution AMEX Returns") > sd(Val$AMEX) [1] 0.01915489 ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com > hist(Val$SPX, breaks=20, main="Distribution SPXX Returns") > sd(Val$SPX) [1] 0.01468776 26
  • 27.
    These are obviousnegatively skewed distributions ESGF 5IFM Q1 2012 Reminders 3 − − 3 = = − 2 3/2 vinzjeannin@hotmail.com • Negative skew: long left tail, mass on the right, skew to the left • Positive skew: long right tail, mass on the left, skew to the right > skewness(Val$AMEX) [1] -0.2453693 > skewness(Val$SPX) 27 [1] -0.4178701
  • 28.
    These are obviousleptokurtic distributions ESGF 5IFM Q1 2012 Reminders 4 − − 4 = = − 2 2 vinzjeannin@hotmail.com > library(moments) > kurtosis(Val$AMEX) What is their K? [1] 5.770583 (excess kurtosis) > kurtosis(Val$SPX) [1] 5.671254 28 Subtract 3 to make it relative to the normal distribution…
  • 29.
    Quick check: whatare the Skewness and Kurtosis of {1,2,-3,0,-2,1,1}? ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com Excel function SKEW R function skewness (package moments) 29
  • 30.
    ESGF 4IFM Q12012 vinzjeannin@hotmail.com Excel function KURT R function kurtosis (package moments) 30
  • 31.
    By the way,what is the most platykurtic distribution in the nature? Toss it! ESGF 4IFM Q1 2012 Head = Success = 1 / Tail = Failure = 0 vinzjeannin@hotmail.com > require(moments) > library(moments) > toss<-rbinom(10000000,1,0.5) > mean(toss) [1] 0.5001777 > kurtosis(toss) [1] 1.000001 > kurtosis(toss)-3 [1] -1.999999 > hist(toss, breaks=10,main="Tossing a coin 10 millions times",xlab="Result of the trial",ylab="Occurence") 31 > sum(toss) [1] 5001777
  • 32.
    50.01777% rate ofsuccess: fair or not fair? Trick coin ? Can be tested later with a Bayesian approach ESGF 4IFM Q1 2012 On a perfect 50/50, Kurtosis would be 1, Excess Kurtosis -2: the minimum! This is a Bernoulli trial (, ) with > 1 and 0 < < 1 ∈ ℝ and integer vinzjeannin@hotmail.com Mean SD (1 − ) Skewness 1 − 2 (1 − ) Kurtosis 1 −3 (1 − ) 32 Easy to demonstrate if p=0.5 the Kurtosis will be the lowest Bit more complicated to demonstrate it for any distribution
  • 33.
    Back to ourseries, a good tool is the BoxPlot ESGF 5IFM Q1 2012 Too Many Outliers! vinzjeannin@hotmail.com There should be 2 max To be normal Fatter tails than the normal distribution 33 boxplot(Val$AMEX,Val$SPX, main="AMEX & S&P BoxPlots", names=c("AMEX","SPX"),col="blue")
  • 34.
    Leptokurtic distributions Negatively skeweddistribution ESGF 5IFM Q1 2012 Are they normal distributions? vinzjeannin@hotmail.com Let’s compare them to normal distributions with same standard deviation and mean and make the QQ Plots 34
  • 35.
    x=seq(-0.2,0.2,length=200) y1=dnorm(x,mean=mean(Val$AMEX),sd=sd( Val$AMEX)) hist(Val$AMEX, breaks=100,main="AmEx Returns / Normal ESGF 5IFM Q1 2012 Distribution",xlab="Return",ylab="Occ urence") lines(x,y1,type="l",lwd=3,col="red") vinzjeannin@hotmail.com x=seq(-0.2,0.2,length=200) y1=dnorm(x,mean=mean(Val$SPX),sd=sd(Val$S PX)) hist(Val$SPX, breaks=20,main="S&P Returns / Normal Distribution",xlab="Return",ylab="Occuren ce") lines(x,y1,type="l",lwd=3,col="red") 35
  • 36.
    ESGF 5IFM Q12012 Excess kurtosis obvious vinzjeannin@hotmail.com Fatter and longer tails 36 Let’s have a look to their CDF through QQPlot
  • 37.
    > qqnorm(Val$AMEX) > qqnorm(Val$SPX) > qqline(Val$AMEX) > qqline(Val$SPX) ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com Fatter tails 37 Let’s properly test the normality
  • 38.
    Can use manytests… • Kolmogorov-Smirnov • Jarque Bera • Chi Square • ESGF 5IFM Q1 2012 Shapiro Wilk Let’s try Kolmogorov-Smirnov It compares the distance between the empirical vinzjeannin@hotmail.com CDF and the CFD of the reference distribution 38
  • 39.
    ESGF 5IFM Q12012 x=seq(-4,4,length=1000) plot(ecdf(Val$AMEX),do.points=FALSE, col="red", lwd=3, main="Normal Distribution against AMEX - CFD's", xlab="x", ylab="P(X<=x)") lines(x,pnorm(x,mean=mean(Val$AMEX),sd=sd(Val$AMEX)),col="blue",t ype="l",lwd=3) vinzjeannin@hotmail.com x=seq(-4,4,length=1000) plot(ecdf(Val$SPX),do.points=FALSE, col="red", lwd=3, main="Normal Distribution against S&P - CFD's", xlab="x", ylab="P(X<=x)") lines(x,pnorm(x,mean=mean(Val$SPX),sd=sd(Val$SPX)),col="blue",typ e="l",lwd=3) 39
  • 40.
    > ks.test(Val$SPX, "pnorm") > ks.test(Val$AMEX, "pnorm") One-sample Kolmogorov- One-sample Kolmogorov-Smirnov Smirnov test test data: Val$SPX data: Val$AMEX D = 0.4811, p-value < 2.2e-16 D = 0.4742, p-value < 2.2e-16 alternative hypothesis: two-sided alternative hypothesis: two-sided ESGF 5IFM Q1 2012 The 0 hypothesis is the distribution is normal vinzjeannin@hotmail.com Do we accept or reject the hypothesis 0 with a 95% confidence interval? The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table 40
  • 41.
    vinzjeannin@hotmail.com 1.36 Sample size: 251 = 0.086 251 Rejected or not? 41 P-Value was giving Rejected! Series aren’t fitting a normal distribution the answer
  • 42.
    Ok, we nowknow a bit more the 2 series we want to regress > lm(Val$AMEX~Val$SPX) Call: lm(formula = Val$AMEX ~ Val$SPX) ESGF 5IFM Q1 2012 Coefficients: (Intercept) Val$SPX 0.0004505 1.1096287 plot(Val$SPX,Val$AMEX, main="S&P / AmEx", xlab="S&P", ylab="AmEx", col="red") vinzjeannin@hotmail.com abline(lm(Val$AMEX~Val$SPX), col="blue") = 110.96% ∗ + 0.045% 42
  • 43.
    The next importantstep is no analyse the residuals > Reg<-lm(Val$AMEX~Val$SPX) > summary(Reg) ESGF 5IFM Q1 2012 Call: lm(formula = Val$AMEX ~ Val$SPX) Residuals: Min 1Q Median 3Q Max -0.030387 -0.006072 -0.000114 0.006624 0.027824 vinzjeannin@hotmail.com Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0004505 0.0006365 0.708 0.48 Val$SPX 1.1096287 0.0434231 25.554 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01008 on 249 degrees of freedom Multiple R-squared: 0.7239, Adjusted R-squared: 0.7228 F-statistic: 653 on 1 and 249 DF, p-value: < 2.2e-16 43 They need to be a white noise, you can have a first assessment with quartiles
  • 44.
    plot(Reg) layout(matrix(1:4,2,2)) vinzjeannin@hotmail.com ESGF 5IFM Q1 2012 44
  • 45.
    QQ Plot comparesthe CDF ESGF 5IFM Q1 2012 A perfect fit is a line vinzjeannin@hotmail.com Left tail noticeably different 45
  • 46.
    ESGF 5IFM Q12012 vinzjeannin@hotmail.com Residuals should be randomly distributed around the 0 horizontal line You don’t want to see a trend, a dependence To accept or reject the regression you need residuals to be a white noise 46 Their mean should be 0
  • 47.
    ESGF 5IFM Q12012 Nothing suggesting a white noise vinzjeannin@hotmail.com • Square root of the standardized residuals as a function of the fitted values • There should be no obvious trend in this plot 47
  • 48.
    Showing now leverage Marginal importance of a point in the regression ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com Far points suggest outlier or poor model 48
  • 49.
    So do weaccept the regression? Probably not… But let’s check… Kolmogorov-Smirnov on residuals ESGF 5IFM Q1 2012 1.36 Higher bound value for the = = 0.086 251 H0 to be accepted vinzjeannin@hotmail.com Resid<-resid(Reg) ks.test(Resid, "pnorm") One-sample Kolmogorov-Smirnov test data: Resid D = 0.4889, p-value < 2.2e-16 alternative hypothesis: two-sided Rejected! Regression between 2 different asset are very often poor 49 Heteroscedasticity Basis risk if you hedge anyway
  • 50.
    Conclusion ESGF 5IFM Q1 2012 OLS Residuals vinzjeannin@hotmail.com Normality Heteroscedasticity 50