• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Applied Statistics II
 

Applied Statistics II

on

  • 257 views

Second course of Applied Statistics, MSc level in Buisiness School

Second course of Applied Statistics, MSc level in Buisiness School

Statistics

Views

Total Views
257
Views on SlideShare
257
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Applied Statistics II Applied Statistics II Presentation Transcript

    • ESGF 4IFM Q1 2012 Applied StatisticsVincent JEANNIN – ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 1
    • ESGF 4IFM Q1 2012Summary of the session (est. 4.5h)• R Steps by Steps• Reminders of last session• The Value at Risk vinzjeannin@hotmail.com• OLS & Exploration 2
    • R Step by Step Downloadable for free (open source) ESGF 4IFM Q1 2012 http://www.r-project.org/ vinzjeannin@hotmail.com 3
    • Main screen vinzjeannin@hotmail.com ESGF 4IFM Q1 20124
    • Menu: File / New Script vinzjeannin@hotmail.com ESGF 4IFM Q1 20125
    • Step 1, upload your data Excel CSV file easy to import ESGF 4IFM Q1 2012 Path C:UsersvinDesktop vinzjeannin@hotmail.com Note: 4 columns with headers 6DATA<-read.csv(file="C:/Users/vin/Desktop/DataFile.csv",header=T)
    • Run your instruction(s) vinzjeannin@hotmail.com ESGF 4IFM Q1 20127
    • You can call variables anytime you want ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 8
    • vinzjeannin@hotmail.com ESGF 4IFM Q1 20129
    • summary(DATA) Shows a quick summary of the distribution of all variables SPX SPXr AMEXr AMEX Min. : 86.43 Min. :-0.0666344 Min. : 97.6 Min. :-0.0883287 1st Qu.: 95.70 1st Qu.:-0.0069082 1st Qu.:104.7 1st Qu.:-0.0094580 Median :100.79 Median : 0.0010016 Median :108.8 Median : 0.0013007 ESGF 4IFM Q1 2012 Mean : 99.67 Mean : 0.0001249 Mean :109.4 Mean : 0.0005891 3rd Qu.:103.75 3rd Qu.: 0.0075235 3rd Qu.:114.1 3rd Qu.: 0.0102923 Max. :107.21 Max. : 0.0474068 Max. :123.5 Max. : 0.0710967summary(DATA$SPX) Shows a quick summary of the distribution of one variable vinzjeannin@hotmail.com Min. 1st Qu. Median Mean 3rd Qu. Max. 86.43 95.70 100.80 99.67 103.80 107.20 min(DATA) Careful using the following instructions max(DATA) > min(DATA) [1] -0.08832874 This will consider DATA as one variable > max(DATA) [1] 123.4793 > sd(DATA) SPX SPXr AMEXr AMEX 4.92763551 0.01468776 6.03035318 0.01915489 10 Mean & SD > mean(DATA) SPX SPXr AMEXr AMEX 9.967046e+01 1.249283e-04 1.093951e+02 5.890780e-04
    • Easy to show histogram ESGF 4IFM Q1 2012 vinzjeannin@hotmail.comhist(DATA$SPXr, breaks=25, main="Distribution of SPXr", ylab="Freq", 11 xlab="SPXr", col="blue")
    • Obvious Excess Kurtosis ESGF 4IFM Q1 2012 Obvious Asymmetry vinzjeannin@hotmail.comFunctions doesn’t exists directly in R…However some VNP (Very Nice Programmer) built and shared add-in Package Moments 12
    • Menu: Packages / Install Package(s) ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com• Choose whatever mirror (server) you want• Usually France (Toulouse) is very good as it’s a University Server with all the packages available 13
    • ESGF 4IFM Q1 2012Once installed, you can load them with thefollowing instructions: require(moments) library(moments) vinzjeannin@hotmail.com New functions can now be used! 14
    • > require(moments)> library(moments)> skewness(DATA) SPX SPXr AMEXr AMEX-0.6358029 -0.4178701 0.1876994 -0.2453693 ESGF 4IFM Q1 2012> kurtosis(DATA) SPX SPXr AMEXr AMEX2.411177 5.671254 2.078366 5.770583 vinzjeannin@hotmail.comBtw, you can store any result in a variable > Kur<-kurtosis(DATA$SPXr) > Kur [1] 5.671254 15
    • Lost?Call the help! help(kurtosis) ESGF 4IFM Q1 2012 Reminds you the package vinzjeannin@hotmail.com Syntax Arguments definition 16
    • Let’s store a few values SPMean<-mean(DATA$SPXr) SPSD<-sd(DATA$SPXr) Package Stats Build a sequence, the x axis ESGF 4IFM Q1 2012 x<-seq(from=SPMean-4*SPSD,to=SPMean+4*SPSD,length=500) Build a normal density on these x vinzjeannin@hotmail.com Y1<-dnorm(x,mean=SPMean,sd=SPSD) Package Stats Display the histogramhist(DATA$SPXr, breaks=25,main="S&P Returns / Normal Package graphicsDistribution",xlab="Returns",ylab="Occurences", col="blue") Display on top of it the normal density lines(x,y1,type="l",lwd=3,col="red") Package graphics 17
    • ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 18Positive Excess Kurtosis & Negative Skew
    • Let’s build a spread Spd<-DATA$SPXr-DATA$AMEXWhat is the mean? ESGF 4IFM Q1 2012 Mean is linear + = + () − = − () vinzjeannin@hotmail.com Let’s verify> mean(DATA$SPXr)-mean(DATA$AMEX)-mean(Spd)[1] 0 19
    • What is the standard deviation? Is standard deviation linear? NO! ESGF 4IFM Q1 2012 VAR + = 2 + 2 + 2(, )> (var(DATA$SPXr)+var(DATA$AMEX)-2*cov(DATA$SPXr,DATA$AMEX))^0.5 vinzjeannin@hotmail.com[1] 0.01019212> sd(Spd)[1] 0.01019212 Let’s show the implication in a proper manner Let’s create a portfolio containing half of each stocks 20
    • Portf<-0.5*DATA$SPXr+0.5*DATA$AMEXplot(sd(DATA$SPXr),mean(DATA$SPXr),col="blue",ylim=c(0,0.0008),xlim=c(0.012,0.022),ylab="Return",xlab="Vol")points(sd(DATA$AMEX),mean(DATA$AMEX),col="red") ESGF 4IFM Q1 2012points(sd(Portf),mean(Portf),col="green") vinzjeannin@hotmail.com 21
    • The efficient frontier vinzjeannin@hotmail.com ESGF 4IFM Q1 201222
    • points(sd(0.1*DATA$SPXr+0.9*DATA$AMEX),mean(0.1*DATA$SPXr+0.9*DATA$AMEX),col="green")points(sd(0.2*DATA$SPXr+0.8*DATA$AMEX),mean(0.2*DATA$SPXr+0.8*DATA$AMEX),col="green") ESGF 4IFM Q1 2012points(sd(0.3*DATA$SPXr+0.7*DATA$AMEX),mean(0.3*DATA$SPXr+0.7*DATA$AMEX),col="green")points(sd(0.4*DATA$SPXr+0.6*DATA$AMEX),mean(0.4*DATA$SPXr+0.6*DATA$AMEX),col="green") vinzjeannin@hotmail.compoints(sd(0.6*DATA$SPXr+0.4*DATA$AMEX),mean(0.6*DATA$SPXr+0.4*DATA$AMEX),col="green")points(sd(0.7*DATA$SPXr+0.3*DATA$AMEX),mean(0.7*DATA$SPXr+0.3*DATA$AMEX),col="green")points(sd(0.8*DATA$SPXr+0.2*DATA$AMEX),mean(0.8*DATA$SPXr+0.2*DATA$AMEX),col="green")points(sd(0.9*DATA$SPXr+0.1*DATA$AMEX),mean(0.9*DATA$SPXr+0.1*DATA$AMEX),col="green") 23
    • plot(DATA$AMEX,DATA$SPXr)abline(lm(DATA$AMEX~DATA$SPXr), col="blue") ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 24
    • LM stands for Linear Models> lm(DATA$AMEX~DATA$SPXr) ESGF 4IFM Q1 2012Call:lm(formula = DATA$AMEX ~ DATA$SPXr)Coefficients:(Intercept) DATA$SPXr 0.0004505 1.1096287 vinzjeannin@hotmail.com = 1.1096 + 0.04%Will be used later for linear regression and hedging 25
    • Do you remember what is the most platykurtic distribution in the nature? Toss Head = Success = 1 / Tail = Failure = 0 ESGF 4IFM Q1 2012 100 toss… Else memory issue…> require(moments)Loading required package: moments> library(moments) vinzjeannin@hotmail.com> toss<-rbinom(100,1,0.5)> mean(toss)[1] 0.52> kurtosis(toss)[1] 1.006410> kurtosis(toss)-3[1] -1.993590> hist(toss, breaks=10,main="Tossing acoin 100 times",xlab="Result of thetrial",ylab="Occurence")> sum(toss)[1] 52 26 Let’s test the fairness
    • Density of a binomial distribution + 1 ! ℎ = ℎ, = = (1 − ) ℎ! ! ESGF 4IFM Q1 2012 Let’s plot this density with ℎ = 52 = 48 vinzjeannin@hotmail.com = 100N<-100h<-52t<-48r<-seq(0,1,length=500)y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^tplot(r,y,type="l",col="red",main="Probability density to have 52 head out100 flips") 27
    • If the probability between 45% and 55% is significant we’ll accept the fairness ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com 28 What do you think?
    • What is the problem with this coin? Obvious fake! Assuming the probability of head is 0.7 Toss it! Head = Success = 1 / Tail = Failure = 0 ESGF 4IFM Q1 2012 100 toss> require(moments)Loading required package: moments> library(moments) vinzjeannin@hotmail.com> toss<-rbinom(100,1,0.7)> mean(toss)[1] 0.72> kurtosis(toss)[1] 1.960317> kurtosis(toss)-3[1] -1.039683> hist(toss, breaks=10,main="Tossing acoin 100 times",xlab="Result of thetrial",ylab="Occurence")> sum(toss)[1] 72 29 Let’s test the fairness (assuming you don’t know it’s a trick)
    • If the probability between 45% and 55% is significant we’ll accept the fairnessN<-100h<-72t<-28r<-seq(0.2,0.8,length=500)y<-(factorial(N+1)/(factorial(h)*factorial(t)))*r^h*(1-r)^t ESGF 4IFM Q1 2012plot(r,y,type="l",col="red",main="Probability density or r given 72head out 100 flips") vinzjeannin@hotmail.com Trick coin! 30
    • Reminders of last session ESGF 4IFM Q1 2012Normal Standard Distribution Snapshot, 4 moments: vinzjeannin@hotmail.com Mean 0 SD 1 Skewness 0 Kurtosis 3 31
    • ≤ = 0.5 − ≤ ≤ + = 0.682 ≤ − + = 0.159 ≤ −1.645 ∗ + = 0.05 − 2 ∗ ≤ ≤ + 2 ∗ = 0.954 ≤ −2 ∗ + = 0.023 ESGF 4IFM Q1 2012 ≤ −2.326 ∗ + = 0.01 − 3 ∗ ≤ ≤ + 3 ∗ = 0.996 ≤ −3 ∗ + = 0.001 vinzjeannin@hotmail.com 32
    • Notation (, ) 1 (−)2 −Density = 22 2 2 ESGF 4IFM Q1 2012 CDF ≤ = = −∞ vinzjeannin@hotmail.com 33
    • Let be X~N(1,1.5) Find: ≤ 4.75 ESGF 4IFM Q1 2012 4.75−1 ≤ 4.75 =P ≤ 1.5 With Y~N(0,1) P ≤ 2.5 =? vinzjeannin@hotmail.com Use the table! P ≤ −2.5 =0.0062 P ≤ 2.5 =0.9938 34 P ≤ 4.75 =0.9938
    • QQ Plot >qqnorm(FCOJ$V1) ESGF 4IFM Q1 2012 >qqline(FCOJ$V1) vinzjeannin@hotmail.comFat Tail 35
    • Geometric Brownian Motion Based on Stochastic Differential Equation = + B&S ESGF 4IFM Q1 2012 vinzjeannin@hotmail.com Discrete form = + with ~N(0,1) CRR S = + 1 − = + 1 − − = = − 1 BV= OpUp ∗ p + OpDown ∗ 1 − p ∗ − 36 = = −
    • Greeks Approximation – Taylor Development 1 = + ∆ ∗ + ∗ ∗ 2 ESGF 4IFM Q1 2012 2 1 + ∗ ∗ 3 6 vinzjeannin@hotmail.com 1 + ∗ 4ℎ ∗ 4 24 etc… 37
    • The Value at RiskEstimate with a specific confidence interval (usually 95% or 99%) the worth losspossible. In other words, the point is to identify a particular point on the left of ESGF 4IFM Q1 2012the distribution vinzjeannin@hotmail.com3 Methods • Historical • Parametrical • Monte-Carlo 38 For now, we’ll focus on VaR on one linear asset… FCOJ is back!
    • Historical VaR• No assumption about the distribution• Easy to implement and calculate ESGF 4IFM Q1 2012• Sensitive to the length of the history• Sensitive to very extreme values Let’s get back to our FCOJ time series, last price is $150 cents vinzjeannin@hotmail.com If we work on returns, we’ve seen the use of the PERCENTILE Excel function • 1% Percentile is -5.22%, 99% Historical Daily VaR is -$7.83 cents • 5% Percentile is -3.34%, 95% Historical Daily VaR is -$5.00 cents 39 Works as well on weekly, monthly, quarterly series
    • Historical VaR ESGF 4IFM Q1 2012 Can be worked as well with prices variations instead of returns but it’s going to be price sensitive! So careful to the bias. vinzjeannin@hotmail.com• 1% Percentile in term of price movement is -$8.11 cents• 5% Percentile in term of price movement is -$4.14 cents 40
    • Parametric VaR• Easy to implement and calculate• Assumes a particular shape of the distribution ESGF 4IFM Q1 2012• Not really sensitive to fat tails FCOJ Mean Return: 0.1364% vinzjeannin@hotmail.com FCOJ SD: 2.1664% We already know: ≤ −1.645 ∗ + = 0.05 ≤ −2.326 ∗ + = 0.01 Then: ≤ −3.43% = 0.05 VaR 95% (-$5.15 cents) 41 ≤ −4.90% = 0.01 VaR 99% (-$7.35 cents)
    • Parametric VaRVery often you assume anyway a 0 mean, therefore: ESGF 4IFM Q1 2012 ≤ −3.57% = 0.05 VaR 95% (-$5.36 cents) ≤ −5.04% = 0.01 VaR 99% (-$8.10 cents) vinzjeannin@hotmail.com Lower values than the historical VaR Problem with leptokurtic distributions, impact of fat tails isn’t strong on the method 42
    • Monte Carlo VaR ESGF 4IFM Q1 2012• Most efficient method when asset aren’t linear• Tough to implement• Assumes a particular shape of the distribution vinzjeannin@hotmail.com Based on an assumption of a price process (for example GBM) Great number of random simulations on the price process to build a distribution and outline the VaR 43
    • Monte Carlo VaR Let’s simulate 10,000 GBM, 252 steps and store the final result ESGF 4IFM Q1 2012library(sde)require(sde)FCOJ<-read.csv(file="C:/Users/Vinz/Desktop/FCOJStats.csv",head=FALSE,sep=",")Drift<-mean(FCOJ$V1) vinzjeannin@hotmail.comVolat<-sd(FCOJ$V1)nbsim<-252Spot<-150Final<-rep(1,10000)for(i in 1:100000){Matr<-GBM(x=Spot,r=Drift, sigma=Volat,N=nbsim)Final[i]<-Matr[nbsim+1]}quantile(Final, 0.05)quantile(Final, 0.01) Don’t be fooled by the 252, we’re still making a daily simulation: what 44 to change in the code to make it yearly?
    • Monte Carlo VaR > quantile(Final, 0.05) 5% ESGF 4IFM Q1 2012 144.93 > quantile(Final, 0.01) 1% 142.7941 vinzjeannin@hotmail.com• 95% Daily VaR is -$5.07 cents• 99% Daily VaR is -$7.21 cents Let’s take off the drift 45
    • Monte Carlo VaR > quantile(Final, 0.05) 5% ESGF 4IFM Q1 2012 144.7583 > quantile(Final, 0.01) 1% 142.6412 vinzjeannin@hotmail.com• 95% Daily VaR is -$5.35 cents• 99% Daily VaR is -$7.36 cents 46
    • Which is the best? Comparison vinzjeannin@hotmail.com ESGF 4IFM Q1 201247
    • Going forward on the VaR ESGF 4IFM Q1 2012All method give different but coherent valuesEasy? Yes but… • We’ve involved one asset only vinzjeannin@hotmail.com • We’ve involved a linear assetWhat about an option?What about 2 assets? 48
    • Going forward on the VaRPortfolio scale: what to look at to calculate the VaR? ESGF 4IFM Q1 2012 Big question, is the VaR additive? vinzjeannin@hotmail.com NO! Keywords for the future: covariance, correlation, diversification 49
    • Going forward on the VaROptions: what to look at to calculate the VaR? ESGF 4IFM Q1 2012 4 risk factors: • Underlying price • Interest rate • Volatility vinzjeannin@hotmail.com • Time 4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)Yes but,… Does the underling price/rate/volatility vary independently? 50 Might be a bit more complicated than expected…
    • OLS & Exploration OLS: Ordinary Least Square ESGF 5IFM Q1 2012 Linear regression model Minimize the sum of the square vertical distances between the observations and the linear approximation vinzjeannin@hotmail.com = = + Residual ε 51
    • Two parameters to estimate: • Intercept α • Slope β ESGF 5IFM Q1 2012Minimising residuals = 2 = − + 2 vinzjeannin@hotmail.com =1 =1 When E is minimal? When partial derivatives i.r.w. a and b are 0 52
    • = 2 = − + 2 = − − 2 =1 =1 =1 Quick high school reminder if necessary… ESGF 5IFM Q1 2012 − − 2 = 2 − 2 − 2 + 2 2 + 2 + 2 vinzjeannin@hotmail.com = −2 + 2 2 + 2 = 0 = −2 + 2 + 2 = 0 =1 =1 − + 2 + = 0 − + + = 0=1 =1 ∗ 2 + ∗ = ∗ + = =1 =1 =1 =1 =1 53
    • Leads easily to the intercept ∗ + = =1 =1 ESGF 5IFM Q1 2012 + = + = vinzjeannin@hotmail.com = − The regression line is going through ( , ) The distance of this point to the line is 0 indeed 54
    • = − y = + − y − = ( − ) ESGF 5IFM Q1 2012 = −2 + 2 2 + 2 = 0 = −2 + 2 + 2 = 0 =1 =1 vinzjeannin@hotmail.com − − = 0 − − = 0 =1 =1 − − + = 0 =1 − + − = 0 =1 ( − − − ) = 0 ( − ) − ( − ) = 0 =1 =1 55 ( − − − ) = 0 =1
    • We have ( − − − ) = 0 and ( − − − ) = 0=1 =1 ESGF 5IFM Q1 2012 ( − − − ) = ( − − − ) =1 =1 vinzjeannin@hotmail.com ( − − − ) − − − − =0 =1 =1 ( − )( − − − ) = 0 =1 Finally… =1( − )( − ) 56 = 2 =1( − )
    • Covariance =1( − )( − ) = 2 =1( − ) Variance ESGF 5IFM Q1 2012 = 2 vinzjeannin@hotmail.com = − You can use Excel function INTERCEPT and SLOPE 57
    • Calculate the Variances and Covariance of X{1,2,3,3,1,2} and Y{2,3,1,1,3,2} ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 58 You can use Excel function VAR.P, COVARIANCE.P and STDEV.P
    • Let’s asses the quality of the regressionLet’s calculate the correlation coefficient (aka Pearson Product-MomentCorrelation Coefficient – PPMCC): ESGF 5IFM Q1 2012 = Value between -1 and 1 = 1 vinzjeannin@hotmail.com Perfect dependence ~0 No dependence Give an idea of the dispersion of the scatterplot 59 You can use Excel function CORREL
    • Poor quality R=0.62 R=0.96 High quality vinzjeannin@hotmail.com ESGF 5IFM Q1 201260
    • What is good quality? ESGF 5IFM Q1 2012 Slightly discretionary… vinzjeannin@hotmail.comIf 3 ≥ = 0.8666 … 2 It’s largely admitted as the threshold for acceptable / poor 61
    • The regression itself introduces a bias Let’s introduce the coefficient of determination R-Squared ESGF 5IFM Q1 2012Total Dispersion = Dispersion Regression + Dispersion Residual vinzjeannin@hotmail.com 2 2 2 − = − + − Dispersion Regression 2 = Total Dispersion In other words the part of the total dispersion explained by the regression 62 You can use Excel function RSQ
    • In a simple linear regression with intercept 2 = 2 ESGF 5IFM Q1 2012Is a good correlation coefficient and a good coefficient ofdetermination enough to accept the regression? vinzjeannin@hotmail.com Not necessarily! Residuals need to have no effect, in other word to be a white noise! 63
    • vinzjeannin@hotmail.com ESGF 5IFM Q1 201264
    • Don’t get fooled by numbers! ESGF 5IFM Q1 2012 For every dataset of the Quarter = 9 = 7.5 vinzjeannin@hotmail.com = 3 + 0.5 = 0.82 2 = 0.67 Can you say at this stage which regression is the best? 65Certainly not those on the right you need a LINEAR dependence
    • ESGF 5IFM Q1 2012Is any linear regression useless? vinzjeannin@hotmail.com Think what you could do to the series Polynomial transformation, log transformation,… 66 Else, non linear regressions, but it’s another story
    • First application on financial market S&P / AmEx in 2011 ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com 67
    • ,& = = 0.8501 & 2 = 2 = 0.7227 ESGF 5IFM Q1 2012 Oups :-o Is Excel wrong? vinzjeannin@hotmail.com R-Squared has different calculation methodsLet’s accept the following regression then as the quality seems pretty good = 0.06% + 1.1046 ∗ & 68
    • How to use this? ESGF 5IFM Q1 2012 • Forecasting? Not really… Both are random variables vinzjeannin@hotmail.com • Hedging? Yes but basis risk Yes but careful to the residuals… In theory, what is the daily result of the hedge? Let’s have a try! 69
    • Hedging $1.0M of AmEx Stocks with $1.1046M of S&P ESGF 5IFM Q1 2012 vinzjeannin@hotmail.com It would have been too easy… Great differences… Why? Sensitivity to the size of the sample 70 Heteroscedasticity Basis Risk
    • The purpose was to see if the market as effect an effect on a particular stock The dependence is obvious but residuals too volatile for any stable application ESGF 5IFM Q1 2012But attention! We are looking for causation, not correlation! Causation implies correlation vinzjeannin@hotmail.com Reciprocity is not true! DON’T BE FOOLED BY PRETTY NUMBERS 71 Let prove this…
    • ESGF 5IFM Q1 2012 vinzjeannin@hotmail.comPerfect linear dependenceExcellent R-Squared 72Residuals are a white noise What’s the problem then?
    • ESGF 5IFM Q1 2012 vinzjeannin@hotmail.comDo you really think fresh lemon reduces car fatalities? 73
    • vinzjeannin@hotmail.com ESGF 5IFM Q1 201274
    • Conclusion R VaR OLS Normal Distribution vinzjeannin@hotmail.com ESGF 4IFM Q1 201275