SlideShare a Scribd company logo
1 of 48
Download to read offline
FINAL REPORT



Application of the Hilbert Huang
 Transform to the prediction of
      financial time series


     Cyrille BEN LEMRID, Hadrien MAUPARD

        Natixis supervisor : Adil REGHAI

       Academic supervisor : Erick HERBIN

              ´
              Ecole Centrale Paris

                 March 18, 2012
Contents



1 Description of the Hilbert Huang Transform, model overview                                    7
  1.1   The Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . .          7
  1.2   Closed form formulas for IMFs . . . . . . . . . . . . . . . . . . . . . . . . . .       9

2 State of the Art                                                                             12
  2.1   Application fields of the EMD . . . . . . . . . . . . . . . . . . . . . . . . . .       12
  2.2   Existence and uniqueness of the Decomposition . . . . . . . . . . . . . . . .          12
        2.2.1   Stoppage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    13
        2.2.2   Cubic spline interpolation . . . . . . . . . . . . . . . . . . . . . . . .     13
        2.2.3   Additional boundary data points . . . . . . . . . . . . . . . . . . . .        14

3 Application to the prediction of financial time series                                        16
  3.1   The Empirical Mode Decomposition in finance: stylized facts . . . . . . . . .           16
        3.1.1   Empirical Modes and Market Structure . . . . . . . . . . . . . . . . .         16
                3.1.1.1   Asset price and IMFs     . . . . . . . . . . . . . . . . . . . . .   16
                3.1.1.2   High frequency modes . . . . . . . . . . . . . . . . . . . . .       17
                3.1.1.3   Low frequency modes . . . . . . . . . . . . . . . . . . . . .        20
        3.1.2   Back to the Box & Jenkins framework . . . . . . . . . . . . . . . . .          20
        3.1.3   Prediction hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . .      23
                3.1.3.1   Deterministic periodicity of low frequency IMFs . . . . . . .        24
                3.1.3.2   Stochastic periodicity of low frequency IMFs . . . . . . . . .       24
  3.2   Insights of potential market predictors . . . . . . . . . . . . . . . . . . . . .      25
        3.2.1   Deterministic periodicity: Low frequency Mean Reverting Strategy .             25


                                               2
3.2.2   Conditional expectation: Low Frequency Multi Asset Shifting Pattern
                Recognition Strategy and Mono Asset IMF Pattern Recognition Strategy 25
                3.2.2.1   Low Frequency Multi Asset Shifting Pattern Recognition
                          Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   25
                3.2.2.2   Low Frequency Mono Asset IMF Pattern Recognition Strategy 26

4 Strategies analysis                                                                          28
  4.1   Portfolio management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       28
        4.1.1   Trading strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     28
        4.1.2   Investment Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . .     28
        4.1.3   Starting time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    29
        4.1.4   Trading time span . . . . . . . . . . . . . . . . . . . . . . . . . . . .      29
        4.1.5   Annualizing the PnL and reducing its variance        . . . . . . . . . . . .   29
  4.2   Underlying and target market . . . . . . . . . . . . . . . . . . . . . . . . . .       30

5 Results                                                                                      31
  5.1   Empirical choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    31
  5.2   Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   32
        5.2.1   Volatility: VIX Index . . . . . . . . . . . . . . . . . . . . . . . . . . .    32
        5.2.2   Volatility: VStoxx Index . . . . . . . . . . . . . . . . . . . . . . . . .     33
        5.2.3   Volatility: other indices: aggregate performance . . . . . . . . . . . .       33
        5.2.4   French stocks: CAC 40: Aggregate performance . . . . . . . . . . . .           34
        5.2.5   Equities Indices and trading pairs: Aggregate performance . . . . . .          34
        5.2.6   Commodities: West Texas Intermediate (WTI)           . . . . . . . . . . . .   35

A Time series Prerequisites                                                                    43
  A.1 Stationary and linear processes . . . . . . . . . . . . . . . . . . . . . . . . .        43
        A.1.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     43
        A.1.2 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    44
        A.1.3 Wold’s decomposition: . . . . . . . . . . . . . . . . . . . . . . . . . .        44
  A.2 The particular case of financial time series: parametric and non parametric
      extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     45
        A.2.1 Non-stationary and non linear financial time series         . . . . . . . . . .   45
        A.2.2 Parametric processes for financial time series . . . . . . . . . . . . . .        45


                                               3
A.2.3 Non parametric processes for financial time series . . . . . . . . . . .   46
  A.3 General time series: the Box & Jenkins approach for prediction . . . . . . .     46

B Evaluation criteria of backtests                                                     47




                                           4
Introduction



This report presents the Hilbert Huang research work of Cyrille Ben Lemrid and Hadrien
Maupard.
The Hilbert Huang Transform relies on two steps: a first non-parametric Empirical Mode
Decomposition which derives the signal into Intrinsic Mode Functions (semi-periodic func-
tions) of various frequencies and then a Hilbert Decomposition, projecting the IMFs onto a
time-frequency 3 dimensional graph. The details of the algorithm are thoroughly explained
in the first chapter of this report.
Due to the lack of theoretical formulation of the latter, and in order to keep our algorithm
flexible and simple, we will only use the Huang Transform, i.e the Empirical Mode
Decomposition (EMD).
Once applied to finance, it is well known that usual tools for the prediction of time series
are power less. Stationary and linear models, such as ARIMA processes, are unable to
predict financial time series, which display non stationarity and a long memory. Hence,
some extensions exist, parametric or non parametric. The EMD belongs the non parametric
predictors of non linear non stationary time series.
In chapter 2, based on empirical observations, interesting stylized facts are derived: IMFs
are uncorrelated to each other. Low frequency IMFs are periodic, and explain most of the
variance of the original time series. With their smooth and regular form, they are still able
to catch most of the information of the time series. High frequency IMFs are closer to
random processes, and have some stationarity. These facts connect the EMD with the Box
& Jenkins statistical framework: a time series can be seem as the sum of a semi-periodic
or seasonal process (the low frequency IMFs) and a random semi-stationary process (high
frequency IMFs).
In chapter 3, two categories of predictors are introduced, relying on two hypotheses on the
seasonal process: either it is deterministic, and can be prolonged, or it remains stochastic,
and conditional expectation is the best predictor. The hypothesis of deterministic seasonal
process gives one strategy: Low Frequency Mean Reverting Strategy. The hypothesis of
stochastic periodicity of the seasonal process gives two strategies: a Low Frequency Multi
Asset Shifting Pattern Recognition, and a Low Frequency Mono Asset Shifting Pattern
Recognition Strategy.

                                             5
In chapter 4, the backtest method of the strategies is formulated, and underlyings for the
backtests are chosen: Implied Volatilities, Stocks, Indices and trading pairs, Commodities.
Finally, the results of these backtests are commented.
In the Annex, prerequisites about time series and asset management literature are given.




                                             6
Chapter    1
Description of the Hilbert Huang Transform,
model overview

The Hilbert-Huang transform (HHT) is an empirically based data analysis method, which
is performed in two steps: first some descriptive patterns are extracted by performing an
adaptive decomposition called Empirical Mode Decomposition (Huang Transform), and then
we can capture the local behavior of these patterns by using tools coming from Hilbert
Spectral Analysis (Hilbert Transform).


1.1     The Empirical Mode Decomposition
The Empirical Mode Decomposition is based on the assumption that any data consists of
different simple intrinsic modes of oscillations. Each of these oscillatory modes is represented
by an intrinsic mode function (IMF) which satisfies two conditions:
– In the whole data set, the number of zero crossings and the number of extrema must equal
  or differ at most by one.
– It exists two envelopes one passing through the local maxima and the other by the local
  minima such that at any point the mean value of the two envelopes is zero.

Definition 1.1.1 An R-valued process x(t) is called an IMF (Intrinsic Mode Function) if
it is a continuous process, that satisfies the following conditions:
  1. The number of extrema and the number zero-crossings must either equal or differ at
     most by one : |#Γmax + #Γmin − #Γ0 | 1.


               with     Γ0 = { t ∈ I| x(t) = 0}
                      Γmax = { t ∈ I| ∃u > 0, ∀s ∈ ]t − u, t + u[/{t}, x(t) > x(s)}
                      Γmin = { t ∈ I| ∃u > 0, ∀s ∈ ]t − u, t + u[/{t}, x(t) < x(s)}


                                              7
2. The mean value m(t) = (xsup (t) + xinf (t))/2 of the envelope defined by the local maxima
     xsup (t) and the envelope defined by the local minima xinf (t) is zero:
                   ∃xsup ∈ Env(Γmax ), ∃xinf ∈ Env(Γmin ), ∀t ∈ I, m(t) = 0
                   with    Env(Γmax ) = {f ∈ C(I) |∀t ∈ Γmax , f (t) = x(t)}
                           Env(Γmin ) = {f ∈ C(I) |∀t ∈ Γmin , f (t) = x(t)}


An IMF represents a simple oscillatory mode as a counterpart to the simple harmonic
function, but it is much more general: instead of constant amplitude and frequency, as
in a simple harmonic component, the IMF can have a variable amplitude and frequency as
functions of time.
The first condition is apparently necessary for oscillation data; the second condition requires
that upper and lower envelopes of IMF are symmetric with respect to the x-axis.
The idea of the EMD method is to separate the data into a slow varying local mean part
and a fast varying symmetric oscillation part, the oscillation part becomes the IMF and the
local mean the residue, the residue serves as input data again for further decomposition,
the process repeats until no more oscillation can be separated from the residue of frequency
mode. On each step of the decomposition, since the upper and lower envelope of the IMF
are unknown initially, a repetitive sifting process is applied to approximate the envelopes
with cubic spline functions passing through the extrema of the IMF. The data serves as the
initial input for the IMF sifting process, and the refined IMF is the difference between the
previous version and mean of the envelopes, the process repeats until the predefined stop
condition is statisfied. The residue is then the difference between the data and the improved
IMF.
One big advantage of this procedure is that it can deal with data from nonstationary and
nonlinear processes. This method is direct, and adaptive, with a posteriori-defined basis,
from the decomposition method, based on and derived from the data.
The intrinsic mode components can be decomposed in the following steps :
  1. Take an arbitrary input signal x(t) and initialize the residual: r0 (t) = x(t), i = 1
  2. Extract the ith IMF
  3. Initialize the ”proto-Imf” h0 with h0 (t) = ri (t), k = 1
  4. Extract the local maxima and minima of the ”proto-Imf” hk−1 (t)
  5. Interpolate the local maxima and the local minima by a cubic spline to form upper
     and lower envelopes of hk−1 (t)
  6. Calculate the mean mk−1 (t) of the upper and lower envelopes of hk−1 (t)
  7. Define: hk (t) = hk−1 (t) − mk−1 (t)
  8. If IMF criteria are satisfied, then set IMFi (t) = hk (t) else go to (4) with k = k + 1
  9. Define: ri (t) = ri−1 (t) − IM Fi (t)
 10. If ri (t) still has at least two extrema, then go to (2) with i = i+1; else the decomposition
     is completed and ri (t) is the ”residue” of x(t).

                                                8
Figure 1.1: Sifting process of the empirical mode decomposition: (a) an arbitrary input; (b)
identified maxima (diamonds) and minima (circles) superimposed on the input; (c) upper
envelope and lower envelope (thin solid lines) and their mean (dashed line); (d) prototype
intrinsic mode function (IMF) (the difference between the bold solid line and the dashed line
in Figure 2c) that is to be refined; (e) upper envelope and lower envelope (thin solid lines)
and their mean (dashed line) of a refined IMF; and (f) remainder after an IMF is subtracted
from the input.

Once a signal has been fully decomposed, the signal x(t) can be written as
                                          N
                                 x(t) =         IM Fi (t) + r(t)
                                          i=1



1.2     Closed form formulas for IMFs
Rather than a Fourier or wavelet based transform, the Hilbert transform was used, in order
to compute instantaneous frequencies and amplitudes and describe the signal more locally.
Equation 3.1 displays the Hilbert transform Yt , which can be written for any function x(t)
of Lp class. The PV denotes Cauchy’s principle value integral.
                                +∞                       t−ε                  +∞
                                                                                           
                           1          IM Fs                       IM Fs              IM Fs 
       Yt = H [IM Ft ] =     PV           ds = lim                     ds +              ds
                           π           t−s      ε→0                 t−s                t−s
                                −∞                         −∞                   t+ε




                                                 9
Algorithm 1 Empirical Mode Decomposition
Require: Signal, threshold ∈ R+ ;
 1: curSignal ← Signal, i = 1;
 2: while (numberOfExtrema(curSignal) > 2) do
 3:   curImf ← curSignal
 4:   while (isNotAnImf(curImf , threshold) = true) do
 5:     Γmax ←emdGetMaxs(curImf );
 6:     Γmin ←emdGetMins(curImf );
 7:     Γmax ←emdMaxExtrapolate(curImf , Γmax );
 8:     Γmin ←emdMinExtrapolate(curImf , Γmin );
 9:     xinf ←emdInterpolate(curImf , Γmax );
10:     xsup ← emdInterpolate(curImf , Γmin );
11:     bias ← (xinf + xinf )/2;
12:     curImf ← curImf − bias;
13:   end while
14:   IMFi ← curImf , i = i + 1;
15:   curSignal ← curSignal − IMFi ;
16: end while
17: N = i;
18: residual ← curSignal
19: return (IM Fi )i=1..N


An analytic function can be formed with the Hilbert transform pair as shown in equation 1.1
                                    Zt = IM Ft + iYt = At eiθt
                          where     At =   IM Ft2 + Yt2
                                                   Yt
                                    θt = arctan
                                                 IM Ft

At and θt are the instantaneous amplitudes and phase functions, respectively
The instantaneous frequency ft can then be written as the time derivative of the phase, as
shown in equation
                                                 1 dθt
                                      ft =
                                                2π dt

Hence, an IMF can be expressed analytically :
                                                       t
                            IM Ft = At cos 2π              fs ds + ψ                  (1.1)
                                                   0


[11] and [14] showed that not all functions give ”good” Hilbert transforms, meaning those
which produce physical instantaneous frequencies. The signals which can be analyzed using
the Hilbert transform must be restricted so that their calculated instantaneous frequency
functions have physical meaning.

                                             10
Next, the empirical mode decomposition is essentially an algorithm which decomposes nearly
any signal into a finite set of functions which have ”good” Hilbert transforms that produce
physically meaningful instantaneous frequencies.
After IMFs have been obtained from the EMD method, one can further calculate instanta-
neous phases of IMFs by applying the Hilbert Huang tranform to each IMF component.




                                           11
Chapter    2
State of the Art

2.1     Application fields of the EMD
The Empirical Mode Decomposition can be a powerful tool to separate non-linear and
non-stationary time series into the trend (residue function) and the oscillation (IMF)
on different time scales, it can describe the frequency components locally adaptively for
nearly any oscillating signal. This makes the tool extremely versatile. This decomposition
find applications in many fields where traditionally Fourier analysis method or Wavelet
method dominate. For instance, HHT has been used to study a wide variety of data
including rainfall, earthquakes, Sunspot number variation, heart-rate variability, financial
time series, and ocean waves to name a few subjects. But there are still some remaining
mathematical issues related to this decomposition which have been mostly left untreated:
convergence of the method, optimization problems (the best IMF selection and uniqueness
of the decomposition), spline problems (best spline functions for the HHT). In the following
chapters, these inexactitudes will be thoroughly developed, and the current potential
solutions from the literature will be gathered.


2.2     Existence and uniqueness of the Decomposition
The convergence of the proto-IMF (hk )k   0   sequence to an IMF is equivalent to the conver-
gence of (mk )k 0 the bias to zero.
                                     L2
                                mk − − 0
                                    −→
                                    k→∞
                                where mk = hk−1 (t) − hk (t)




                                               12
2.2.1     Stoppage criteria
The inner loop should be ended when the result of the sifting process meets the definition
of an IMF. In practice this condition is too strong so we need to specify a relaxed condition
which can be met in a finite number of iterations. The approximate local envelope symmetry
condition in the sifting process is called the stoppage (of sifting) criterion. In the past, several
different types of stoppage criteria were adopted: the most widely used type, which originated
from Huang et al. [14], is given by a Cauchy type of convergence test, the normalized Squared
Difference between two successive sifting operations defined as

                                           T
                                                 |hk−1 (t) − hk (t)|2
                                           t=0
                                  SDk =             T
                                                         h2 (t)
                                                          k−1
                                                   t=0

must be smaller than a predetermined value. This definition is slightly different from the
one given by Huang et al. [14] with the summation signs operating for the numerator and
denominator separately in order to prevent the SDk from becoming too dependent on local
small amplitude values of the sifting time series.
If we assume that the local mean between the upper and lower envelopes converges to zero
in sense of the euclidean norm, we can apply the following Cauchy criterion :

                                                 2
                                        mk−1     L2
                                 log             2         threshold
                                        hk−1     L2

In our implementation the threshold has been calibrated to -15.
These Cauchy types of stoppage criteria are seemingly rigorous mathematically. However,
it is difficult to implement this criterion for the following reasons: First, how small is small
enough begs an answer. Second, this criterion does not depend on the definition of the IMFs
for the squared difference might be small, but there is no guarantee that the function will
have the same numbers of zero crossings and extrema.


2.2.2     Cubic spline interpolation
Since the EMD is an empirical algorithm and involves a prescribed stoppage criterion to
carry out the sifting moves, we have to know the degree of sensitivity in the decomposition
of an input to the sifting process, so the reliability of a particular decomposition can further
be determined. Therefore, a confidence limit of the EMD is a desirable quantity. To compute
the upper and lower envelopes we use a piecewise-polynomial approximation.
In general, the goal of a spline interpolation is to create a function which achieves the best
possible approximation to a given data set. For a smooth and efficient approximation, one
has to choose high order polynomials. A popular choice is the piecewise cubic approximation
function of order three.

                                                   13
The basic idea behind using a cubic spline is to fit a piecewise function of the form :
                                        
                                        
                                          S1 (x), x ∈ [x1 , x2 [
                                           S2 (x), x ∈ [x2 , x3 [
                                        
                                S(x) =
                                       
                                                   ...
                                         Sn−1 (x), x ∈ [xn−1 , xn [
                                       

where Si (x) is a third degree polynomial with coefficients ai , bi , ci and di defined for
i = 0, 1, ..., n − 1 by :

               Si (x) = ai + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 1{x∈[xi ,xi+1 [}

More formally, given a function f (x) defined on an interval [a, b] and a set of nodes
a = x0 < x1 < ... < xn = b, a cubic spline interpolant S(x) for f (x) is a function that
satisfies the following conditions :
              n−1
  1. S(x) =         Si (x)1{x∈[xi ,xi+1 [} is a cubic polynomial denoted by Si (x), on the subinterval
              i=0
      [xi , xi+1 ) for each i = 0, 1, ..., n − 1.
   2. Si+1 (xi+1 ) = Si (xi+1 ) for each i = 0, 1, ..., n − 2.
   3. S i+1 (xi+1 ) = S i (xi+1 ) for each i = 0, 1, ..., n − 2.
   4. S i+1 (xi+1 ) = S i (xi+1 ) for each i = 0, 1, ..., n − 2.
   5. and one of the following set of boundary conditions is also satisfied :
      S (x0 ) = S (xn ) = 0 (free or natural boundary)
      S (x0 ) = f (x0 ) and S (xn ) = f (xn ) (clamped boundary).
But there are four problems with this decomposition method :
– the spline (cubic) connecting extrema is not the real envelope,
– the resulting IMF function does not strictly guarantee the symmetric envelopes,
– some unwanted overshoot may be caused by the spline interpolation,
– the spline cannot be connected at both ends of the data series.
Higher order spline does not in theory resolve these problems.


2.2.3     Additional boundary data points
It has been already illustrated that the cubic spline has somehow to be kept close to the
function especially near both ends of the data series. Therefore, the creation of additional
boundary data points, which are supposed to be applicable to the current data set, appears
to be the key element in technically improving the EMD. All artificially added boundary data
points are generated from within the original set of discrete knots to represent a characteristic
natural behaviour. One routine is to add new maxima and minima to the front and rear
of the data series. As a basic requirement, these data points are located off the original
time span the signal was recorded in. Therefore, no information is cancelled out and the
natural data series remains unaffected. However one disadvantage of this method is that we
anticipate the future trend of the data.

                                                   14
Figure 2.1: Additional boundary data points




Figure 2.2: Additional boundary data points




                    15
Chapter    3
Application to the prediction of financial time
series

3.1       The Empirical Mode Decomposition in finance:
          stylized facts
In the markets, one can assume that different modes of oscillations in stock prices are
provoked by different kinds of actors. With this approach, the lowest frequency IMFs could
be considered as a statistical proxy to infer the behavior of big investors (Banks, Insurance
companies, Hedge Funds, Mutual Funds, Pension Funds..) and predict the evolution of a
stock in the long run.


3.1.1     Empirical Modes and Market Structure
3.1.1.1   Asset price and IMFs

With the EMD, a time series can be represented as follows:
                                                    NIM F
                                    ∗
                         ∃NIM F ∈ N |∀t ∈ Z, Xt =           IM Fti + rt
                                                     i=1


It can be interesting to observe the correlation matrix of the random vector
                                                  
                                            Xt
                                        IM Ft1 
                                                  
                                       
                                            ...   
                                                   
                                        IM Ft   N 

                                             rt      t∈Z


This matrix can be empirically computed on an example, as shown in Figure 3.1, page 17.

                                             16
This correlation matrix shows three stylized facts:

                X     IMF1     IMF2      IMF3      IMF4     IMF5     IMF6     IMF7     Trend
       X      1.000   0.038    0.039     0.172     0.184    0.169    0.488    0.795    0.879
     IMF1     0.038   1.000    0.034     -0.009    -0.026   0.002    0.022    -0.021   -0.019
     IMF2     0.039   0.034    1.000     -0.040    -0.005   0.043    -0.015   -0.022   -0.025
     IMF3     0.172   -0.009   -0.040    1.000     0.109    -0.114   0.102    0.049    0.059
     IMF4     0.184   -0.026   -0.005    0.109     1.000    -0.035   0.018    0.035    0.040
     IMF5     0.169   0.002    0.043     -0.114    -0.035   1.000    -0.096   -0.014   -0.038
     IMF6     0.488   0.022    -0.015    0.102     0.018    -0.096   1.000    -0.043   0.148
     IMF7     0.795   -0.021   -0.022    0.049     0.035    -0.014   -0.043   1.000    0.977
     Trend    0.879   -0.019   -0.025    0.059     0.040    -0.038   0.148    0.977    1.000


Figure 3.1: Empirical correlation matrix of AXA share price and its IMFs, from 01/01/2003
to 14/11/2011

– The EMD displays some empirical orthogonal features. Hence, the following theoretical
  assumption can be made:

                       ∀(i, j) ∈ [[1; NIM F ]]2 , i = j ⇒ IM Fi , IM Fj = 0

  With the usual following scalar product:

                                        : L2 (R) × L2 (R) → R

                                        (X, Y ) → Cov(X, Y )
– Low Frequency IMFs display strong correlations to the original price series.
– High Frequency IMFs are uncorrelated to the original price series.


3.1.1.2   High frequency modes

Empirical modes have a strong connection to the market structure. On top of being
uncorrelated to the general time series, high frequency IMFs present strong correlation to
daily price movements, as shown by the following empirical correlation matrix of daily yields
to the IMFs processes and the trend process, (see Figure 3.3, page 18).
High frequency IMFs are accurately following daily movements, as shown in 3.4, page 19.
Moreover, there appears to be local jumps in amplitude of the high frequency IMFs when
daily changes are becoming sharper. This comes along local jumps in volatility. Hence, the
amplitude of high frequency IMFs is probably positively correlated to the short term implicit
volatility of the At-The-Money options on the stock. As we see on the last graph, amplitude
jumps along with volatility, as daily yields exceed 10% (see Figure 3.6 and Figure 3.5).
Finally, despite their significant short term periodicity, they display some signs of stationar-
ity, as shown in Figure 3.2, page 18.

                                                  17
Figure 3.2: Sample Autorrelation Function of the highest frequency IMF
               diff(Axa)
                 Axa
                          IMF1     IMF2     IMF3      IMF4     IMF5     IMF6     IMF7     Trend
    diff(Axa)
      Axa
               1.000      0.552    0.087    0.029     0.016    -0.007   -0.008   -0.004   -0.003
    IMF1       0.552      1.000    0.034    -0.009    -0.026   0.002    0.022    -0.021   -0.019
    IMF2       0.087      0.034    1.000    -0.040    -0.005   0.043    -0.015   -0.022   -0.025
    IMF3       0.029      -0.009   -0.040   1.000     0.109    -0.114   0.102    0.049    0.059
    IMF4       0.016      -0.026   -0.005   0.109     1.000    -0.035   0.018    0.035    0.040
    IMF5       -0.007     0.002    0.043    -0.114    -0.035   1.000    -0.096   -0.014   -0.038
    IMF6       -0.008     0.022    -0.015   0.102     0.018    -0.096   1.000    -0.043   0.148
    IMF7       -0.004     -0.021   -0.022   0.049     0.035    -0.014   -0.043   1.000    0.977
    Trend      -0.003     -0.019   -0.025   0.059     0.040    -0.038   0.148    0.977    1.000


Figure 3.3: Empirical correlation of daily returns of AXA and its IMFs , from 01/01/2003
to 14/11/2011

A Ljung Box Test is a more quantitative way to assess stationarity. It tests the hypothesis
of being a second order stationary process.

Definition 3.1.1 Let m ∈ N. (H0 ) : ρ1 = ρ2 = .. = ρm = 0. The Ljung-Box Test statistic
is given by:
                                                m
                                                    ρ2
                                                     h
                           Q(m) = N (N + 2)
                                               h=1
                                                   N −h
                                            QH0 (m) ∼ χ2
                                                       m


However, the test rejects the absence of autocorrelations for the highest frequency IMF on
our previous example of AXA. Therefore, it suggests that these high frequency modes display

                                                 18
Figure 3.4: Daily returns of Societe Generale and its highest frequency IMF




Figure 3.5: Daily returns of Societe Generale and its highest frequency IMF during a high
volatility period




                                           19
Figure 3.6: Daily returns of Societe Generale and its highest frequency IMF normalized

some stationary properties, but keep a very short periodicity, making these processes not
i.i.d.


3.1.1.3   Low frequency modes

Low Frequency modes describe the long term dynamics of the stock. It reflects the positions
of long term actors within the market,(see Figure 3.7, page 21).
It can also be interpreted in terms of economic cycles, if applied to very long time frames,
(see Figure 3.8, page 21).


3.1.2     Back to the Box & Jenkins framework
The EMD algorithm derives the following decomposition of a given financial time series:
                                                   NIM F
                                    ∗
                        ∃NIM F ∈ N |∀t ∈ Z, Xt =           IM Fti + rt
                                                    i=1

Moreover, low frequency IMFs explain the general evolution of the stock price, and have
strong periodic patterns, whereas high frequency IMFs are linked to the daily movements.
Figure 3.9, page 22 shows all the IMFs of an example of financial time series. In red are low
frequency IMFs, in blue high frequency IMFs, in green the stock price.

                                            20
Figure 3.7: Societe Generale stock price and the sum of its 2 lowest frequency IMFs




              Figure 3.8: VIX and the sum of its 3 lowest frequency IMFs

Hence, we can separate the previous sum into the two components of the Box & Jenkins
approach.




                                           21
Figure 3.9: Empirical Mode Decomposition of Axa, starting 01/01/2003 until 14/11/2011


                                            Nsep               NIM F
                            ∗
                ∃NIM F ∈ N |∀t ∈ Z, Xt =           IM Fti +               IM Fti + rt
                                            i=1               i=Nsep +1           T rend
                                            Random part         Seasonal part

In the rest of this report, we will sometimes include the trend process in the seasonal part.
In the previous example, the decomposition is the following:

Remark 3.1.2 In the Figure 3.10, page 23 , the correlation between the ”Random Part”
and the ”Seasonal Part” is:
                           Corr(Xtrandom (t), Xtseasonal (t)) = −0.01

In order to properly differentiate low frequency IMFs and high frequency IMFs, one needs a
rule. Multiple choices are possible:
– A stationarity criterion for high frequency IMFs: statistical tests, such as Ljung Box Test,
  Runs Test, KPSS Test..
– A periodicity criterion for low frequency IMFs: low frequency IMFs must display less than
  p pseudo periods within the time interval. Beyond that threshold p, they are considered
  as moving too quickly, and not carrying information relative to the general evolution of
  the series.
Provided that the goal of this decomposition is to extract a seasonal pattern reflecting the
broad evolution of the stock price, a selection criterion for low frequency IMFs can seem
more appropriate. It is also more intuitive than statistical tests. Therefore, the criterion
chosen is the following:

                                              22
Figure 3.10: Box& Jenkins Decomposition of Axastock price based on EMD, starting
01/01/2003 until 14/11/2011


                                                                                        [ 1;T ]
              ∀i ∈ [[1; NIM F ]] , IM Fti     1 t T
                                                      ∈ Xtseasonal    1 t T
                                                                                 ⇔ #Γ0,i             3

where

                                   [ 1;T ]
                              # Γ0,i         = s ∈ [[1; T ]] |IM Fsi = 0

Therefore, we can now expressly write the Box & Jenkins decomposition based on the EMD:
                                      NIM F                               NIM F
              ∗
  ∃NIM F ∈ N |∀t ∈ [[1; T ]] , Xt =           IM Fti .1 #Γ[1;T ] >3   +           IM Fti .1         [ 1;T ]
                                                                                                  #Γ0,i       3
                                                                                                                  + rt
                                                          0,i
                                       i=1                                 i=1                                     T rend
                                                Random part                        Seasonal part


Remark 3.1.3 In the Figure 3.10, page 23, the ”Seasonal Part” achieves to explain 91% of
the variance of the original time series:

                                         V ar(Xtseasonal + rt )
                               R2 :=                            = 0.91
                                              V ar(Xt )


3.1.3    Prediction hypotheses
We have now decomposed our signal in two parts: the seasonal component, and the random
high frequency component. These two components are moreover uncorrelated, and the
variance of the seasonal process explains most of the variance of the original time series.

                                                      23
We have the following decomposition:

              ∀T ∈ N,∃NIM F ∈ N∗ |∀t ∈ [[0; T ]] ,
                       N                                   NIM F
              Xt =           IM Fti .1 #Γ[1;T ] >3    +               IM Fti .1       [ 1;T ]
                                                                                   #Γ0,i        3
                                                                                                    + rt
                                         0,i
                       i=1                                i=Nsep +1                                   T rend
                                Random part                                Seasonal part


We can now proceed to separate estimations for each process. As we noticed on the earlier
example, the Random Part Process is approximately centered on zero. Therefore, we will
make a simple prediction for this process:

                            random
                   E       Xs           T +1 s 2.T
                                                     | Xtrandom       1 t T
                                                                              = (0)T +1         s 2.T


We now have to formulate a prediction of the seasonal process. Following the framework of
Box & Jenkins, two hypotheses are possible, in order to formulate predictions.


3.1.3.1   Deterministic periodicity of low frequency IMFs

The first possible assumption is that the seasonal component is deterministic. Hence, we
assume that in the future, this periodic component will keep its properties:
                                                           Ti
                                                                     i
– Periodicity: ∀i ∈ [[i0 ; NIM F ]] , ∀t ∈ Z,                   IM Ft+j = 0, or ∀i ∈ [[i0 ; NIM F ]] , ∀t ∈
                                                          j=1
  Z, IM Ft+Ti = IM Fti
          i

                                                                [t;t+T ]          [t;t+T ]           [t;t+Ti ]
– IMF structure: ∀i ∈ [[i0 ; NIM F ]] , ∀t ∈ Z, #Γmax,i i + #Γmin,i i − #Γ0,i                                    1
Where
                                   [t;t+Ti ]
                                 Γ0,i          = s ∈ [t; t + Ti ] |IM Fsi = 0

                     [t;t+T ]                                                  i
                  Γmin,i i =         s ∈ [t; t + Ti ] |∃u > 0, s = arg min IM Fv
                                                                           v∈[s−u,s+u]


                     [t;t+T ]                                                  i
                  Γmin,i i =         s ∈ [t; t + Ti ] |∃u > 0, s = arg max IM Fv
                                                                           v∈[s−u,s+u]


Hence, with these properties, each low frequency IMF can be easily prolonged, hence the
estimated future seasonal process.


3.1.3.2   Stochastic periodicity of low frequency IMFs

The second possible assumption is less strong that the first one. Instead of assuming that
the seasonal component is deterministic, it is now assumed that it presents some periodicity,
while remaining stochastic.

                                                          24
seasonal
         E     Xs        + rt   T +1 s 2.T
                                               | Xtseasonal + rt   1 t T
                                                                               seasonal + r )
                                                                           = (Xs           t T +1    s 2.T


if we assume that future estimations should rely on sequences from the past of the same
duration.


3.2       Insights of potential market predictors

3.2.1        Deterministic periodicity: Low frequency Mean Reverting
             Strategy
This strategy relies on the hypothesis of deterministic periodicity of the seasonal process.
To formulate a prediction at a certain time t, within the horizon T, this strategy relies on
the following algorithm:

                                                           random    seasonal
                                ∀s ∈ [[t − T ; t]] , Xs = Xs      + Xs



                                                  seasonal                                       t
                            seasonal             Xs                        seasonal       1          seasonal
    ∀s ∈ [[t −   T ; t]] , Xs          = log         seasonal
                                                                 where X              =             Xs
                                                 X                                      T + 1 s=t−T
    if       Xtseasonal − 2Xt−1
                            seasonal    seasonal
                                     + Xt−2      > threshold
             seasonal
           Xt+T
    then              > 1 and αM ean Re vertingStrat (t) = 1
           Xtseasonal
    elseif    Xtseasonal − 2Xt−1
                             seasonal     seasonal
                                      + Xt−2          < −threshold ,
              Xt+T
    then           < 1 and αM ean Re vertingStrat (t) = −1
               Xt

3.2.2        Conditional expectation: Low Frequency Multi Asset Shift-
             ing Pattern Recognition Strategy and Mono Asset IMF Pat-
             tern Recognition Strategy
3.2.2.1      Low Frequency Multi Asset Shifting Pattern Recognition Strategy

This strategy relies on the hypothesis of stochastic periodicity of the seasonal process. It
considers a pool of N assets, among which is the asset which is to be predicted: i0 .To
formulate a prediction at a certain time t0, within the horizon T, this strategy relies on the
following algorithm:
First, each price process is decomposed.


                                                            25
i    random,i    seasonal,i
                          ∀i ∈ [[1; Nassets ]] , ∀s ∈ [[0; t0 ]] , Xs = Xs        + Xs

And the asset of interest too, since it belongs to the pool of assets.

                                                            i     random,i0    seasonal,i0
                                         ∀s ∈ [[0; t0 ]] , Xs0 = Xs         + Xs

Then, the three best fitting patterns are chosen:

 {(i1 , t1 ) , (i2 , t2 ) , (i3 , t3 )}
                                                                                                                                2
                                                       seasonal,i                                 seasonal,i0
                                                      Xs                                         Xs
  =             arg min                        log      seasonal,i
                                                                                       −   log      seasonal,i0
      (i,u)∈[ 1;Nassets ] ×[ 1;t0 −T ]
            [              [                          X [ u;u+T ]          u s u+T
                                                                                                 X [ t0 −T ;t0 ]   t0 −T s t0



                                                                              0   t
                                                   seasonal,i0         1
                                     where       X [ t0 −T ;t0 ]   =             X seasonal,i0 ,
                                                                     T + 1 s=t −T s
                                                                                   0
                                                                           u+T
                                                 seasonal,i          1            seasonal,i
                                     and X [ u;u+T ] =                           Xs
                                                                   T +1    s=u


                                     1{X i1 >X i1 } + 1{X i2 >X i2 } + 1{X i3 >X i3 }
                                        t1 +T t1         t2 +T t2         t3 +T t3
                             Let Z =
                                                             3
Z is the decision variable. Predictions are made depending on the vote of the three best
fitted scenarios.


                                 1                   Xt0 +T
                  if     Z>              ,then              > 1 and αShif tingP atternStrat (t) = 1
                                 2                    Xt0
                                           1              Xt0 +T
                  else if        Z<              ,then           < 1 and αShif tingP atternStrat (t) = −1
                                           2               Xt0

3.2.2.2       Low Frequency Mono Asset IMF Pattern Recognition Strategy

This strategy relies on the hypothesis of stochastic periodicity of the seasonal process. It is
very similar to the previous strategy. Differences exist on two essential points:
– It does not require any other time series than the historical prices of the asset that is to
  be predicted
– It is adapted.




                                                                      26
To formulate a prediction at a certain time t0, within the horizon T, this strategy relies on
the following algorithm:
                          i     random,i0    seasonal,i0
 ∀s ∈ [[t0 − T ; t0 ]] , Xs0 = Xs         + Xs
                                                                                                                             2
                                                seasonal,i0                               seasonal,i0
                                               Xs                                        Xs
  {t1 , t2 , t3 } = arg min         log          seasonal,i0
                                                                          −       log       seasonal,i0
                    u∈[ 1;t0 −T ]
                      [                        X [ u;u+T ]      u s u+T
                                                                                         X [ t0 −T ;t0 ]        t0 −T s t0
                                          t0                                                      u+T
            seasonal,i0          1                 seasonal,i0         seasonal,i0         1                seasonal,i0
 where X [ t0 −T ;t0 ] =                          Xs           ,and X [ u;u+T ]      =                     Xs
                               T + 1 s=t                                                 T +1
                                          0 −T                                                    s=u

Hence, the decision variable is:

                                    1{X i0 >X i0 } + 1{X i0 >X i0 } + 1{X i0 >X i0 }
                                       t1 +T t1         t2 +T t2         t3 +T t3
                             Z=
                                                            3
And the predictions computed by the strategy:


                               1                 Xt0 +T
               if       Z>          , then              > 1 and αAutoP atternStrat (t) = 1
                               2                  Xt0
                                    1                  Xt0 +T
               elseif         Z<          , then              < 1 and αAutoP atternStrat (t) = −1
                                    2                   Xt0




                                                               27
Chapter    4
Strategies analysis

4.1      Portfolio management
Definition 4.1.1 Let Pt , t ∈ [1; T ] denote the stochastic process of the spot price of an asset.


4.1.1     Trading strategy
Definition 4.1.2 A trading strategy is represented as follows. At each time period, it
provides an anticipation of the market: -1 is bearish (i.e the price will decline), +1 is bullish
(i.e the price will rise).
                                    α : [0; T ] → {−1; 1}


4.1.2     Investment Horizon
An investment duration Tinvest , in terms of business days, drives the predictions of a given
strategy. It can range from 10 to 252 business days (from two weeks until a year). Therefore,
portfolio management can be driven by mid term or long term earnings prospects.

                                        Tinvest ∈ [|50; 252|]

These prospects drive the PnL of the strategy, as positions will be covered after Tinvest
business days.

Definition 4.1.3

                      ∀α ∈ {−1; 1}[|1;T |] , ∀t ∈ [1; T ] ,
                                                (P(i+Tinvest )∧t − Pi )
                      P &LTinvest (t) = α(i)
                          α
                                                            Pi            1 i t−1




                                                  28
4.1.3      Starting time
The beginning of the time series is not subject to predictions. It is kept as prerequisite
information in order to compute the first predictions. Indeed, concerning the Patterns Fitting
Strategies, one needs to have a few historical patterns available for fitting. Therefore, the
predictions start at the time:
                                     tstart = 10.Tinvest
Hence, the new PnL vector:


                        ∀α ∈ {−1; 1}[|1;T |] , ∀t ∈ [tstart ; T ] ,
                                                  (P(i+Tinvest )∧t − Pi )
                        P &LTinvest (t) = α(i)
                            α
                                                              Xi                   tstart i t−1



4.1.4      Trading time span
A trading time span δt defines the duration between two portfolio rebalances. It is defined
in terms of business days. Every δt days, one trade is closed and another position is taken.
By default, and assumed in the back tests, it is equal to a fifth of the investment duration.
Therefore, for example for Tinvest = 252, there will be 5 rebalances per year, one every 50
business days. Hence, in this example, δt = 50.
                                                           Tinvest
                                                    δt =
                                                              5

Therefore, the PnL now becomes:

     ∀α ∈ {−1; 1}[|1;T |] , ∀t,
                                                  (P(Tinvest +i.δt +tstart )∧t − Pi.δt +tstart )
     P &Lαinvest ,δt (t) =
         T
                              α(i.δt + tstart )
                                                                 Xi.δt +tstart                     0 i
                                                                                                         (T −tstart )
                                                                                                             δt




4.1.5      Annualizing the PnL and reducing its variance
The PnL computed so far still carries a time dependence. It needs to be annualized, with
the following operation:

∀α ∈ {−1; 1}[|1;T |] , ∀t,
                                            (P(Tinvest +i.δt +tstart )∧t − Pi.δt +tstart )      252
P &LTinvest ,δt (t) =
    α                   α(i.δt + tstart )                                                  .
                                                           Xi.δt +tstart                       Tinvest   0 i
                                                                                                                 (T −tstart )
                                                                                                                     δt



Moreover, it is valuable to manage reducing the volatility of returns of a given strategy. If
it is not at the expense of its mean rate of return, it contributes to slightly improving the
sharp ratio. Hence, usual stop losses and cash ins are implemented in the back tests.

                                                           29
From this section, we know have a frame for deriving a PnL vector from a given strategy
α ∈ {−1; 1}[|1;T |] , and a given a price process Pt , t ∈ [1; T ]. The latter remains now to be
evaluated with objective criteria.


4.2      Underlying and target market
Three potential trading strategies were identified. They have been tested on three different
types of underlyings, for the following reasons:
- Stocks: CAC40. The goal is to find recurrent seasonality patterns within a single stock,
or between different stocks. This seasonality could be caused by important market shifts on
the initiative of big players, such as pension funds, insurance companies, asset managers, or
banks proceeding to portfolio rebalancing.
- Implied Volatilities: VIX Index, VStoxx Index, VCAC, VDAX.. These indices are
computed by a closed formula relying on implied volatilities of numeral options, hence
reflecting the overall structure of the volatility smile.
- Index Pairs: based on the following most liquid worldwide indexes: CAC, DAX, SX5E,
SPX, NKY, UKX, IBOV, SMI, HSI. Index pairs provide trajectories which generally follow
mean reverting processes, and have the advantage of being extremely liquid.
- Commodities: WTI. Commodities are known for displaying seasonality features. Therefore,
they may constitute interesting underlyings.




                                              30
Chapter      5
Results

5.1        Empirical choices
Three strategies have been mentioned in this paper. However, in practical matters, tests
have only targeted one strategy, for the following reasons:
– The IMF Mean Convexity Reverting Strategy has been tested for a few examples. However,
  the calibration of the threshold has proven to be difficult. Even the seasonal process
  remains somehow unstable, in particular its last values, where the second order derivative
  is computed.
                                                     random    seasonal
                          ∀s ∈ [[t − T ; t]] , Xs = Xs      + Xs


                                                  seasonal                                      t
                              seasonal           Xs                       seasonal       1
      ∀s ∈ [[t −   T ; t]] , Xs          = log                  where X              =            X seasonal
                                                 X
                                                     seasonal                          T + 1 s=t−T s
      if     Xtseasonal − 2Xt−1
                            seasonal    seasonal
                                     + Xt−2      > threshold
               seasonal
             Xt+T
      then              > 1 and αM ean Re vertingStrat (t) = 1
             Xtseasonal
      elseif    Xtseasonal − 2Xt−1
                               seasonal     seasonal
                                        + Xt−2          < −threshold ,
              Xt+T
      then         < 1 and αM ean Re vertingStrat (t) = −1
               Xt
– The IMF Multi Asset Shifting Pattern Recognition Strategy is very similar to the Mono
  Asset IMF Pattern Recognition Strategy. However, it is harder to implement because it
  is a Multi Asset Strategy: results will depend on how much data is utilized. Moreover,
  the multi asset strategy is unadapted, contrary to the mono asset strategy.
– Due to the small computing capacity available during the project, tests have mainly been
  focused on the last strategy hereby developed, i.e the Mono Asset IMF Pattern Recognition
  Strategy. Indeed, the idea is to derive reliable results by testing it on a wide range of


                                                          31
underlyings (which list has been provided earlier). That way, the law of large numbers
  will help provide reliable results.


5.2     Tables

5.2.1     Volatility: VIX Index
First are shown the most promising results: on the VIX Index. 3 different investment
horizons have been tested: 50 days, 150 days, and 250 days. The result for 50 days is the
most reliable, for it relies on the highest amount of trades. Bullish signals also seem to be
more performing. It is quite expected, considering the general behavior of implied volatility.
As provided in table 5.1, page 32, the last back test, with 35% cash in, gives a 57% hit ratio,
and a 0,40 annualized sharp ratio.




Figure 5.1: Back test on the VIX Index, with a 50 days investment horizon and 35% cash in

Moreover, the results are also encouraging for a longer investment horizon: 150 days.
However, they rely on less trades, simply due to a longer horizon. Again, bullish signals
are more powerful. As provided in table 5.2, page 33, the last back test, with 100% cash in,
gives a 54% hit ratio, and a 0,54 annualized sharp ratio.


                                              32
Figure 5.2: Back test on the VIX Index, with a 150 days investment horizon and 100% cash
in

Finally, the results are now presented for 250 days. Again, they rely on less trades, simply
due to a longer horizon. Again, bullish signals are more powerful. As provided in table 5.3,
page 34, the last back test, with 150% cash in, gives a 66% hit ratio, and a 0,76 annualized
sharp ratio. However, it only relies on 15 trades, from 2005 to 2011, where bullish signals
obviously have proven quite effective. Therefore, more tests with longer data need to be
pursued.


5.2.2     Volatility: VStoxx Index
To confirm long term results for the VIX Index, similar tests have been pursued on the
VStoxx Index, see table 5.4, page 35. For the 126 days horizon, results are also encouraging.
Without any cash ins, and with all signals (bullish and bearish), a 56% hit ratio is achieved
on 70 trades, and a 0,20 sharp ratio.


5.2.3     Volatility: other indices: aggregate performance
Here are, table 5.5, page 36, the aggregated results for all the indices tested in the following
pool. The amount of trades represented is around 1000 for the 150 days table, and 100 for

                                              33
Figure 5.3: Back test on the VIX Index, with a 250 days investment horizon and 150% cash
in

the 250 days table, dating from 2006 to 2011. Therefore, results are very reliable.
It seems that the aggregate prediction power on volatility is uncertain. However, results
remain encouraging for the VIX, i.e the most liquid index (via futures or ETFs) among the
volatility indexes.


5.2.4    French stocks: CAC 40: Aggregate performance
Aggregated results for the French stocks market are quite disappointing, see table 5.6,
page 37.


5.2.5    Equities Indices and trading pairs: Aggregate performance
Tests have also been pursued on the main worldwide equity indices, and their pairs.
Aggregated results on approximately 2000 trades and 20 years historical prices show that
this strategy does not have any prediction power on this asset class, see table 5.7, page 37.




                                             34
Figure 5.4: Back test on the VStoxx Index, with a 126 days investment horizon and without
cash in

5.2.6    Commodities: West Texas Intermediate (WTI)
Results for the West Texas Intermediate (WTI), are shown in table 5.8, page 38.




                                           35
Figure 5.5: Back test on a pool of volatity indexes




                        36
Figure 5.6: Back test on French stocks from the CAC 40




  Figure 5.7: Back test on a pool of Equities Indexes

                          37
Figure 5.8: Back test on the West Texas Intermediate (WTI) oil price




                                38
Conclusion and outlook



HHT offers a potentially viable method for nonlinear and nonstationary data analysis. But
in all the cases studied, HHT does not give sharper results than most of the traditional time
series analysis methods. In order to make the method more robust, rigorous, in application,
an analytic mathematical foundation is needed.
In our view, the most likely solutions to the problems associated with HHT can only be
formulated in terms of optimization: the selection of spline, the extension of the end, etc.
This may be the area of future research.
While this study tries to design some theoretical ground for the HHT, further theoretical
work is greatly needed in this direction.
On the empirical aspect, more research also needs to be pursued. While not particularly
performing on stock prices, the EMD seems more adapted to curves resembling implied
volatilities, and more able to derive meaningful dynamics off them. Strong results have
been reached concerning main volatility indexes, such as VIX or VStoxx. Therefore, further
empirical tests on this asset class could be rewarding.
Moreover, a great variety of assets has not been tested for prediction: other types of
commodities (only WTI was tested), precious metals, fixed income assets such as sovereign
or corporate bonds..
Finally, significant tests have only been pursued for one strategy among the three that were
formulated. The code provided with this report is able to generate results for the two other
strategies, and can be the base of wider back tests on industrial scale.
In terms of applications, this study has limited itself to the first part of the HHT algorithm,
i.e the Empirical Mode Decomposition. Maybe further work can be done in order to properly
formalize the Hilbert spectrum, make new hypotheses, and derive potential predictors using
the same methodology as in this study.


Acknowledgements
This study has been pursued in collaboration with the Equity Quantitative Team at Natixis.
Since our first arrival in the locals of Natixis, we have been thoroughly assisted. Successful

                                             39
professionals were kind enough to answer our questions and to give their opinion on our work
during the entire year. Without their advices, this study would not have achieved its current
findings. Our project consisted of working at Natixis every Wednesday, from October 011 to
March 2012. Workdays were a great opportunity to work within the finance environment,
and to learn about the role of quantitative associates within the banking industry.
First, we would like to thank our supervisor Mr Adil Reghai, Head of Quantitative Analytics
at Natixis. Adil showed much interest for our project, shared our views, and gave us valuable
feedbacks during the whole year. He helped us design our predictors, and constantly gave
us new ideas for back tests. We would also like to thank Mr Adel Ben Haj Yedder, who
greatly contributed to our project, proofread our reports, and gave us feedbacks. We also
had the opportunity to discuss with Adel about his daily job, the role of the team, and about
the banking industry in general. His views will be valuable in order for us to precise our
professional project and goals. We are also thankful to Stephanie Mielnik, Thomas Combarel
for their contributions, and to the team in general.
Moreover, this study was pursued in collaboration with Dr Alex Langnau, Global Head
of Quantitative Analytics at Allianz. Alex is a consultant for Natixis and academics at
the University of Munich, and talked Adil about the Hilbert Huang Transform. During our
project, Alex also gave us valuable feedbacks, in particular about the portfolio management of
our trading strategies. Also, we would like to thank our teachers at Ecole Centrale Paris, from
the Applied Mathematics Department. Mr Erick Herbin, professor of stochastic processes,
supervised our project. He encouraged us to formalize the Hilbert Huang Algorithm. Despite
being a difficult task, it has revealed to be essential. We are also thankful to Mr Gilles
Fa¨, professor of statistics and time series, for his lectures, providing important theoretical
   y
grounds for our study.
Finally, we wish to thank our colleagues from the Applied Mathematics Program who
pursued other projects in collaboration with Natixis. We have been working with them
since October, and we enjoyed having breaks with them. To name them: Marguerite de
Mailard, Lucas Mahieux, Nicolas Pai and Victor Gerard.




                                              40
Bibliography


[1] Barnhart, B. L., The Hilbert-Huang Transform: theory, applications, development,
    dissertation, University of Iowa, (2011)
[2] Brockwell, P.J., and Davis, R.A.,Introduction to Time Series and Forecasting, second
    edition , Springer-Verlag, New York. (2002)
[3] Cohen, L., Generalized phase space distribution functions, J. Math. Phys. 7, 781 (1966)
[4] Datig, M., Schlurmann, T., Performance and limitations of the Hilbert-Huang trans-
    formation (HHT) with an application to irregular water waves, Ocean Engineering, 31,
    1783-1834, (2004)
[5] De Boor, C., A Practical Guide to Splines, Revised Edition, Springer- Verlag. (2001)
[6] Dos Passos, W.,( Numerical methods, algorithms, and tools in C# ), CRC Press, (2010)
[7] Fa¨, G., S´ries Chronologiques, Lecture notes, Ecole Centrale Paris, (2012)
      y       e
[8] Flandrin, P., Goncalves, P., Rilling, G., EMD Equivalent Filter Banks, from Interpre-
    tation to Applications, in : Hilbert-Huang Transform and Its Applications (N.E. Huang
    and S.S.P. Shen, eds.), pp. 57 -74. (2005)
[9] Golitschek, M., On the convergence of interpolating periodic spline functions of high
    degree, Numerische Mathematik, 19, 46-154, (1972)
[10] Guhathakurta, K., Mukherjee, I., Chowdhury, A.R., Empirical mode decomposition
    analysis of two different financial time series and their comparison, Elsevier, Chaos
    Solitons and Fractals, 37, 1214-1227, (2008)
[11] Holder, H.E., Bolch, A.M. and Avissar.,R., Using the Empirical Mode Decomposition
    (EMD) method to process turbulence data collected on board aircraft Submitted to J.
    Atmos. Ocean. Tech., (2009)
[12] Hong, L., Decomposition and Forecast for Financial Time Series with High-frequency
    Based on Empirical Mode Decomposition, Elsevier, Energy Procedia, 5, 1333-1340, (2011)
[13] Huang, N.E., Shen, S.S.P., Hilbert-Huang transform and its applications, Volume 5 of
    interdisciplinary mathematical sciences, (2005)
[14] Huang, N.E., Shen, Z., Long, S., Wu, M., Shih, H., Zheng, Q., Yen, N., Tung, C. and
    Liu, H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and
    non-stationary time series analysis Proc. R. Soc.. 454, 1971, 903. (1998)

                                            41
[15] Huang, N. E., Wu, Z., A review on Hilbert-Huang transform: Method and its applications
    to geophysical studies, Rev. Geophys., 46, RG2006, (2008)
[16] Liu, B. , Riemenschneider, S. ,Xu, Y., Gearbox, fault diagnosis using empirical mode
    decomposition and Hilbert spectrum, Mechanical Systems and Signal Processing, 20, 718-
    734, (2006)
[17] Pan, H., Intelligent Finance - General Principles, International Workshop on Intelligent
    Finance, Chengdu, China, (2007)
[18] Reghai, A., Goyon, S., Messaoud, M., Anane, M., Market Predictor : Pr´diction
                                                                            e
    quantitative des tendances des march´s, Etude Strategie Quant Recherche Actions,
                                        e
    Natixis Securities, Paris (2010)
[19] Reghai, A., Goyon, S., Combare, T., Ben Haj Yedder, A., Mielnik, S., Sharpe
    Select : optimisation de l’investissement Cross Asset, Etude Strat´gie Quant Recherche
                                                                      e
    Quantitative, Natixis Securities, Paris (2011)




                                             42
Appendix     A
Time series Prerequisites

A.1       Stationary and linear processes

A.1.1     Stationarity
Definition A.1.1 A time series is a stochastic process in discrete time, Xt ∈ R, t ∈ Z for
example. Thus, a time series is composed of realizations of a single statistical variable during
a certain time interval (for example a month, a trimester, a year, or a nanosecond ).

We can expect to develop some interesting predictions if the process displays certain
structural properties:
– Either some ”rigidity”, allowing to extrapolate some deterministic parts.
– Either some form of statistical invariance, called stationnarity, allowing learning the
   present and predicting the future, based on the past.

Definition A.1.2 Xt ∈ R, t ∈ Z is said strictly stationary iff its finite dimensional
distributions are invariant under any time translation, i.e:

             ∀τ ∈ Z,∀n ∈ N∗ , ∀(t1 , .., tn ) ∈ Zn , (Xt1 , .., Xtn ) ∼ (Xt1 −τ , .., Xtn −τ )

Definition A.1.3 Xt ∈ R, t ∈ Z is said to be stationary at the second order iff:
– (Xt )t∈Z ∈ L2 (R), i.e ∀t ∈ Z, E [Xt2 ] < ∞
– ∀t ∈ Z, E [Xt ] = E [X0 ] := µX
– ∀s, t ∈ Z, γX (t, s) := Cov (Xt , Xs ) = Cov (X0 , Xs−t ) =: γ (s − t)

Definition A.1.4 The autocorrelation function of a stochastic process Xt ∈ R, t ∈ Z is the
series
                                         Cov (Xt , Xs )
                          ρ(s, t) =                          1
                                    (V ar (Xs ) .V ar (Xt )) 2

                                                    43
A.1.2     Linearity
Within the family of stationary processes, an important family of processes is known as the
linear processes. They are derived from white noise processes.

Definition A.1.5 A stochastic process Xt ∈ R, t ∈ Z is a weak white noise iff it is stationary
at the second order, and:
                                       µX = 0
                                 ∀h ∈ Z, γX (h) = σ 2 .δ0 (h)

Definition A.1.6 A stochastic process Xt ∈ R, t ∈ Z is a strong white noise iff it is i.i.d
and µX = 0.

Hence, second order linear processes can now be defined.

Definition A.1.7 Xt ∈ R, t ∈ Z is a weak (resp. strong) second order linear process iff
                               ∃(Zt )t∈Z , ∃(ψj )j∈Z , such as:
                        (Zt ) weak (resp. strong) White Noise (σ 2 )
                                              |ψj | < ∞
                                        j∈Z

                                 ∀t ∈ Z, Xt =              ψj Zt−j
                                                     j∈Z

Second order linear processes are well known and studied. It can appear as excessive to
merely study these kinds of linear processes. However, the Wold’s decomposition provides a
strong result for these processes:


A.1.3     Wold’s decomposition:
Every second order stationary process Xt ∈ R, t ∈ Z can be written as the sum of a second
order linear process and a deterministic component.
                              ∀t ∈ Z, Xt =           ψj Zt−j + η(t)
                                               j∈Z

where :
                         (Zt ) weak (resp. strong) White Noise (σ 2 )


                                              |ψj | < ∞
                                        j∈Z


                                           η ∈ RZ

Hence, basic linear processes, such as ARMA models, provide a strong base for explaining
stationary processes. However, the latter assumption is quite reductive.

                                               44
A.2       The particular case of financial time series: para-
          metric and non parametric extensions

A.2.1      Non-stationary and non linear financial time series
Financial time series are known for displaying a few characteristic unknown to stationary or
linear processes:
– Their flat tales distribution are non compatible to Gaussian density functions. They must
   are more accurately fitted by power laws, i.e processes of infinite variance. These processes
   are not utilized in practice, because a measure of volatility (i.e variance) is paramount in
   finance ( For example, in order to price options, or to compute sharp rations of indices,
   stocks or strategies. See our definitions in chapter 4).
– Non-linearity: they display non constant variance. Clusters of volatilities are common in
   financial time series. These clusters are incompatible with linear and stationary processes
   like ARMA (which have a constant variance).
– Non-stationnarity: they have a long term memory.
– - Time inversion: linear stationary processes are invariant to time changes. However, a
   financial time series obviously is coherent with only one time direction, and is not consistent
   if time is inversed.


A.2.2      Parametric processes for financial time series
To tackle the issue of non linearity, popular parametric models are ARCH(p) and GARCH
(p,q).

Definition A.2.1 Xt ∈ R, t ∈ Z is defined as an ARCH(p) process by:

                                            Xt = σt Zt
                                                    p
                                      2                      2
                                     σt   = ψ0 +         ψj Xt−j
                                                   j=1

Where:
                                      ∀j ∈ [[1; p]] , ψj > 0
                                           Zt iid(0, 1)

Definition A.2.2 Xt ∈ R, t ∈ Z is defined as a GARCH (p,q) process by:

                                            Xt = σt Zt
                                                    p
                                      2                      2
                                     σt = ψ0 +           ψj Xt−j
                                                   j=1




                                               45
Where:
                                   ∀j ∈ [[1; p]] , ψj > 0


                                   ∀j ∈ [[1; q]] , ϕj > 0


                                        Zt iid(0, 1)


A.2.3     Non parametric processes for financial time series
Numerous non parametric methods have been and are being developed to fit financial data.
The goal here is not to mention all of them. The Empirical Mode Decomposition lives within
this environment.


A.3      General time series: the Box & Jenkins approach
         for prediction
Within the framework of Box and Jenkins (1970), a time series can be modeled as realization
of simultaneous phenomena:
– The first, ηt is a regular and smooth time evolution, called trend.
– The second, St , is a periodic process of period T.
– The third component, Wt , is the random component. It can be a stationary process.

                                 ∀t ∈ Z, Xt = Wt + St + ηt

Definition A.3.1 St is a periodic process with period T, iff:

                                    ∀t ∈ Z, St+T = St
                                         T
                                              St = 0
                                        t=1

                                    : L2 (R) × L2 (R) → R
                                   (X, Y ) → Cov(X, Y )

From this framework, we will connect the EMD algorithm to the literature of time series.
Some assumptions will be made to match this approach, and they will drive the predictions
algorithms formulated later in this chapter.




                                              46
Appendix     B
Evaluation criteria of backtests

In order to evaluate the efficiency of the potential highlighted strategies, a few rules need to
be taken. As mentioned above, adaptability is the main one. Every back-test ought to be
implemented as if the future were unknown. Therefore, it implies some obligations: - a time
frame restriction: every prediction must be computed without using any feature of its future
values. - a class restriction: predictions must be evaluated on an aggregate basis, for every
kind of underlying. For example, no discrimination of the best or worst performers should
be done, because it is another form of unadaptability.
However, it remains pertinent to evaluate the performance of the strategies regarding the
asset class to which they are applied. Hence, as it will be mentioned further on, some
strategies might be more efficient on stocks, trading pairs, or implied volatilities.
Within the asset management theory, a few variables suffice to quantify the efficacy of a
trading strategy. They are usually noted as the following:
Let X the random variable which denotes the annualized gain(or loss) of each trade of a
trading strategy. Its realizations are written: Xi , i = 1..n

Definition B.0.2

                              Average return = E [X]
                               Average gain = E [X|X > 0]
                                Average loss = E [X|X < 0]
                             Drawdown = - min {Xi , i = 1..n}
                                  Hit ratio = P (X > 0)

These values need to be analysed together. A hit ratio above 50% should be compared with
the average gain and loss. The drawdown also provides valuable information on the risk of
the trading strategy, and gives a hint of the sharp ratio. Drawdowns are a valuable tool in
order to calibrate stop losses thresholds.


                                             47
Definition B.0.3
                                                     E [X − T ]
                             Information ratio =
                                                    V ar (X − T )
with T as a benchmark performance rate. In the results displayed in the annexes, the
benchmark performance rate plugged in the computations of Information ratios is 0%.

Definition B.0.4
                                                    E [X − T ]
                            Sortino ratio =
                                                 V ar (X|X < T ))
with T as a benchmark performance rate. In our results displayed in the annexes, the
benchmark performance rate in our computations of Sortino ratios is 0%. Therefore, this
measurement only takes into account ”negative volatility”, i.e the volatility of losses.

Definition B.0.5
                                                   E [X − rf ]
                                 Sharp ratio =
                                                    V ar (X))
with rf as the annualized risk free rate

The analyses hereby rely mainly on the information ratio and the Sortino ratio. However,
at the light of the current risk free rates, the information ratio can be considered as a good
approximation of the Sharp ratio.




                                              48

More Related Content

What's hot

Stochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer RelationshipsStochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer RelationshipsMOSTLY AI
 
Volatility and Microstructure [Autosaved]
Volatility and Microstructure [Autosaved]Volatility and Microstructure [Autosaved]
Volatility and Microstructure [Autosaved]Amit Mittal
 
Option Probability
Option ProbabilityOption Probability
Option ProbabilityEricHartford
 
Financial market crises predictor
Financial market crises predictorFinancial market crises predictor
Financial market crises predictorVictor Romanov
 
Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...
Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...
Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...Waqas Tariq
 
Michael_Mark_Dissertation
Michael_Mark_DissertationMichael_Mark_Dissertation
Michael_Mark_DissertationMichael Mark
 
Notes for Volatility Modeling lectures, Antoine Savine at Copenhagen University
Notes for Volatility Modeling lectures, Antoine Savine at Copenhagen UniversityNotes for Volatility Modeling lectures, Antoine Savine at Copenhagen University
Notes for Volatility Modeling lectures, Antoine Savine at Copenhagen UniversityAntoine Savine
 
Modelling the rate of treasury bills in ghana
Modelling the rate of treasury bills in ghanaModelling the rate of treasury bills in ghana
Modelling the rate of treasury bills in ghanaAlexander Decker
 
Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...
Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...
Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...Antoine Savine
 
Risk valuation for securities with limited liquidity
Risk valuation for securities with limited liquidityRisk valuation for securities with limited liquidity
Risk valuation for securities with limited liquidityJack Sarkissian
 
Module 9 risk management &amp; trading psychology
Module 9 risk management &amp; trading psychologyModule 9 risk management &amp; trading psychology
Module 9 risk management &amp; trading psychologyArjun Choudhary
 
Predicting U.S. business cycles: an analysis based on credit spreads and mark...
Predicting U.S. business cycles: an analysis based on credit spreads and mark...Predicting U.S. business cycles: an analysis based on credit spreads and mark...
Predicting U.S. business cycles: an analysis based on credit spreads and mark...Gabriel Koh
 
Valuing Risky Income Streams in Incomplete Markets
Valuing Risky Income Streams in Incomplete MarketsValuing Risky Income Streams in Incomplete Markets
Valuing Risky Income Streams in Incomplete Marketsclarelindeque
 
Differential Machine Learning Masterclass
Differential Machine Learning MasterclassDifferential Machine Learning Masterclass
Differential Machine Learning MasterclassAntoine Savine
 
Testing and extending the capital asset pricing model
Testing and extending the capital asset pricing modelTesting and extending the capital asset pricing model
Testing and extending the capital asset pricing modelGabriel Koh
 
Independent Study Thesis_Jai Kedia
Independent Study Thesis_Jai KediaIndependent Study Thesis_Jai Kedia
Independent Study Thesis_Jai KediaJai Kedia
 

What's hot (20)

Stochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer RelationshipsStochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer Relationships
 
Volatility and Microstructure [Autosaved]
Volatility and Microstructure [Autosaved]Volatility and Microstructure [Autosaved]
Volatility and Microstructure [Autosaved]
 
Option Probability
Option ProbabilityOption Probability
Option Probability
 
Financial market crises predictor
Financial market crises predictorFinancial market crises predictor
Financial market crises predictor
 
Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...
Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...
Bid and Ask Prices Tailored to Traders' Risk Aversion and Gain Propension: a ...
 
Michael_Mark_Dissertation
Michael_Mark_DissertationMichael_Mark_Dissertation
Michael_Mark_Dissertation
 
Notes for Volatility Modeling lectures, Antoine Savine at Copenhagen University
Notes for Volatility Modeling lectures, Antoine Savine at Copenhagen UniversityNotes for Volatility Modeling lectures, Antoine Savine at Copenhagen University
Notes for Volatility Modeling lectures, Antoine Savine at Copenhagen University
 
Modelling the rate of treasury bills in ghana
Modelling the rate of treasury bills in ghanaModelling the rate of treasury bills in ghana
Modelling the rate of treasury bills in ghana
 
Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...
Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...
Notes for Computational Finance lectures, Antoine Savine at Copenhagen Univer...
 
Risk valuation for securities with limited liquidity
Risk valuation for securities with limited liquidityRisk valuation for securities with limited liquidity
Risk valuation for securities with limited liquidity
 
Thesis
ThesisThesis
Thesis
 
Module 9 risk management &amp; trading psychology
Module 9 risk management &amp; trading psychologyModule 9 risk management &amp; trading psychology
Module 9 risk management &amp; trading psychology
 
Predicting U.S. business cycles: an analysis based on credit spreads and mark...
Predicting U.S. business cycles: an analysis based on credit spreads and mark...Predicting U.S. business cycles: an analysis based on credit spreads and mark...
Predicting U.S. business cycles: an analysis based on credit spreads and mark...
 
Valuing Risky Income Streams in Incomplete Markets
Valuing Risky Income Streams in Incomplete MarketsValuing Risky Income Streams in Incomplete Markets
Valuing Risky Income Streams in Incomplete Markets
 
Differential Machine Learning Masterclass
Differential Machine Learning MasterclassDifferential Machine Learning Masterclass
Differential Machine Learning Masterclass
 
Testing and extending the capital asset pricing model
Testing and extending the capital asset pricing modelTesting and extending the capital asset pricing model
Testing and extending the capital asset pricing model
 
pairs trading
pairs tradingpairs trading
pairs trading
 
Optimal investment strategy based on semi-variable transaction costs
Optimal investment strategy based on semi-variable transaction costsOptimal investment strategy based on semi-variable transaction costs
Optimal investment strategy based on semi-variable transaction costs
 
Finansal Kitaplar
Finansal KitaplarFinansal Kitaplar
Finansal Kitaplar
 
Independent Study Thesis_Jai Kedia
Independent Study Thesis_Jai KediaIndependent Study Thesis_Jai Kedia
Independent Study Thesis_Jai Kedia
 

Similar to HHT Report

Financial mathematics
Financial mathematicsFinancial mathematics
Financial mathematicsCharvetXavier
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsJeff Brooks
 
Methods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfMethods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfComrade15
 
Clustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory EClustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory EGabriele Pompa, PhD
 
The value at risk
The value at risk The value at risk
The value at risk Jibin Lin
 
10.1.1.3.9670
10.1.1.3.967010.1.1.3.9670
10.1.1.3.9670reema2601
 
Crypto notes
Crypto notesCrypto notes
Crypto notesvedshri
 
Thesis yossie
Thesis yossieThesis yossie
Thesis yossiedmolina87
 
Comparison of Different Control Strategies for Rotary Flexible Arm Joint
Comparison of Different Control Strategies for Rotary Flexible Arm JointComparison of Different Control Strategies for Rotary Flexible Arm Joint
Comparison of Different Control Strategies for Rotary Flexible Arm Jointomkarharshe
 

Similar to HHT Report (20)

Financial mathematics
Financial mathematicsFinancial mathematics
Financial mathematics
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative Optimizations
 
Methods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdfMethods for Applied Macroeconomic Research.pdf
Methods for Applied Macroeconomic Research.pdf
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Bjr cimpa
Bjr cimpaBjr cimpa
Bjr cimpa
 
D-STG-SG02.16.1-2001-PDF-E.pdf
D-STG-SG02.16.1-2001-PDF-E.pdfD-STG-SG02.16.1-2001-PDF-E.pdf
D-STG-SG02.16.1-2001-PDF-E.pdf
 
Clustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory EClustering Financial Time Series and Evidences of Memory E
Clustering Financial Time Series and Evidences of Memory E
 
The value at risk
The value at risk The value at risk
The value at risk
 
thesis
thesisthesis
thesis
 
thesis-hyperref
thesis-hyperrefthesis-hyperref
thesis-hyperref
 
thesis
thesisthesis
thesis
 
10.1.1.3.9670
10.1.1.3.967010.1.1.3.9670
10.1.1.3.9670
 
Crypto notes
Crypto notesCrypto notes
Crypto notes
 
VHDL Reference
VHDL ReferenceVHDL Reference
VHDL Reference
 
Thesis yossie
Thesis yossieThesis yossie
Thesis yossie
 
final_version
final_versionfinal_version
final_version
 
Phd dissertation
Phd dissertationPhd dissertation
Phd dissertation
 
Optimal control systems
Optimal control systemsOptimal control systems
Optimal control systems
 
Comparison of Different Control Strategies for Rotary Flexible Arm Joint
Comparison of Different Control Strategies for Rotary Flexible Arm JointComparison of Different Control Strategies for Rotary Flexible Arm Joint
Comparison of Different Control Strategies for Rotary Flexible Arm Joint
 
Thesis
ThesisThesis
Thesis
 

Recently uploaded

Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingAggregage
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130Suhani Kapoor
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxhiddenlevers
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Roomdivyansh0kumar0
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
VIP Call Girls Service Begumpet Hyderabad Call +91-8250192130
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
🔝9953056974 🔝Call Girls In Dwarka Escort Service Delhi NCR
🔝9953056974 🔝Call Girls In Dwarka Escort Service Delhi NCR🔝9953056974 🔝Call Girls In Dwarka Escort Service Delhi NCR
🔝9953056974 🔝Call Girls In Dwarka Escort Service Delhi NCR
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(DIYA) Bhumkar Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 

HHT Report

  • 1. FINAL REPORT Application of the Hilbert Huang Transform to the prediction of financial time series Cyrille BEN LEMRID, Hadrien MAUPARD Natixis supervisor : Adil REGHAI Academic supervisor : Erick HERBIN ´ Ecole Centrale Paris March 18, 2012
  • 2. Contents 1 Description of the Hilbert Huang Transform, model overview 7 1.1 The Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Closed form formulas for IMFs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 State of the Art 12 2.1 Application fields of the EMD . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Existence and uniqueness of the Decomposition . . . . . . . . . . . . . . . . 12 2.2.1 Stoppage criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Cubic spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Additional boundary data points . . . . . . . . . . . . . . . . . . . . 14 3 Application to the prediction of financial time series 16 3.1 The Empirical Mode Decomposition in finance: stylized facts . . . . . . . . . 16 3.1.1 Empirical Modes and Market Structure . . . . . . . . . . . . . . . . . 16 3.1.1.1 Asset price and IMFs . . . . . . . . . . . . . . . . . . . . . 16 3.1.1.2 High frequency modes . . . . . . . . . . . . . . . . . . . . . 17 3.1.1.3 Low frequency modes . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Back to the Box & Jenkins framework . . . . . . . . . . . . . . . . . 20 3.1.3 Prediction hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.3.1 Deterministic periodicity of low frequency IMFs . . . . . . . 24 3.1.3.2 Stochastic periodicity of low frequency IMFs . . . . . . . . . 24 3.2 Insights of potential market predictors . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Deterministic periodicity: Low frequency Mean Reverting Strategy . 25 2
  • 3. 3.2.2 Conditional expectation: Low Frequency Multi Asset Shifting Pattern Recognition Strategy and Mono Asset IMF Pattern Recognition Strategy 25 3.2.2.1 Low Frequency Multi Asset Shifting Pattern Recognition Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.2.2 Low Frequency Mono Asset IMF Pattern Recognition Strategy 26 4 Strategies analysis 28 4.1 Portfolio management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 Trading strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.2 Investment Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.3 Starting time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.4 Trading time span . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.5 Annualizing the PnL and reducing its variance . . . . . . . . . . . . 29 4.2 Underlying and target market . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5 Results 31 5.1 Empirical choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2.1 Volatility: VIX Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2.2 Volatility: VStoxx Index . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.3 Volatility: other indices: aggregate performance . . . . . . . . . . . . 33 5.2.4 French stocks: CAC 40: Aggregate performance . . . . . . . . . . . . 34 5.2.5 Equities Indices and trading pairs: Aggregate performance . . . . . . 34 5.2.6 Commodities: West Texas Intermediate (WTI) . . . . . . . . . . . . 35 A Time series Prerequisites 43 A.1 Stationary and linear processes . . . . . . . . . . . . . . . . . . . . . . . . . 43 A.1.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 A.1.2 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A.1.3 Wold’s decomposition: . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A.2 The particular case of financial time series: parametric and non parametric extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 A.2.1 Non-stationary and non linear financial time series . . . . . . . . . . 45 A.2.2 Parametric processes for financial time series . . . . . . . . . . . . . . 45 3
  • 4. A.2.3 Non parametric processes for financial time series . . . . . . . . . . . 46 A.3 General time series: the Box & Jenkins approach for prediction . . . . . . . 46 B Evaluation criteria of backtests 47 4
  • 5. Introduction This report presents the Hilbert Huang research work of Cyrille Ben Lemrid and Hadrien Maupard. The Hilbert Huang Transform relies on two steps: a first non-parametric Empirical Mode Decomposition which derives the signal into Intrinsic Mode Functions (semi-periodic func- tions) of various frequencies and then a Hilbert Decomposition, projecting the IMFs onto a time-frequency 3 dimensional graph. The details of the algorithm are thoroughly explained in the first chapter of this report. Due to the lack of theoretical formulation of the latter, and in order to keep our algorithm flexible and simple, we will only use the Huang Transform, i.e the Empirical Mode Decomposition (EMD). Once applied to finance, it is well known that usual tools for the prediction of time series are power less. Stationary and linear models, such as ARIMA processes, are unable to predict financial time series, which display non stationarity and a long memory. Hence, some extensions exist, parametric or non parametric. The EMD belongs the non parametric predictors of non linear non stationary time series. In chapter 2, based on empirical observations, interesting stylized facts are derived: IMFs are uncorrelated to each other. Low frequency IMFs are periodic, and explain most of the variance of the original time series. With their smooth and regular form, they are still able to catch most of the information of the time series. High frequency IMFs are closer to random processes, and have some stationarity. These facts connect the EMD with the Box & Jenkins statistical framework: a time series can be seem as the sum of a semi-periodic or seasonal process (the low frequency IMFs) and a random semi-stationary process (high frequency IMFs). In chapter 3, two categories of predictors are introduced, relying on two hypotheses on the seasonal process: either it is deterministic, and can be prolonged, or it remains stochastic, and conditional expectation is the best predictor. The hypothesis of deterministic seasonal process gives one strategy: Low Frequency Mean Reverting Strategy. The hypothesis of stochastic periodicity of the seasonal process gives two strategies: a Low Frequency Multi Asset Shifting Pattern Recognition, and a Low Frequency Mono Asset Shifting Pattern Recognition Strategy. 5
  • 6. In chapter 4, the backtest method of the strategies is formulated, and underlyings for the backtests are chosen: Implied Volatilities, Stocks, Indices and trading pairs, Commodities. Finally, the results of these backtests are commented. In the Annex, prerequisites about time series and asset management literature are given. 6
  • 7. Chapter 1 Description of the Hilbert Huang Transform, model overview The Hilbert-Huang transform (HHT) is an empirically based data analysis method, which is performed in two steps: first some descriptive patterns are extracted by performing an adaptive decomposition called Empirical Mode Decomposition (Huang Transform), and then we can capture the local behavior of these patterns by using tools coming from Hilbert Spectral Analysis (Hilbert Transform). 1.1 The Empirical Mode Decomposition The Empirical Mode Decomposition is based on the assumption that any data consists of different simple intrinsic modes of oscillations. Each of these oscillatory modes is represented by an intrinsic mode function (IMF) which satisfies two conditions: – In the whole data set, the number of zero crossings and the number of extrema must equal or differ at most by one. – It exists two envelopes one passing through the local maxima and the other by the local minima such that at any point the mean value of the two envelopes is zero. Definition 1.1.1 An R-valued process x(t) is called an IMF (Intrinsic Mode Function) if it is a continuous process, that satisfies the following conditions: 1. The number of extrema and the number zero-crossings must either equal or differ at most by one : |#Γmax + #Γmin − #Γ0 | 1. with Γ0 = { t ∈ I| x(t) = 0} Γmax = { t ∈ I| ∃u > 0, ∀s ∈ ]t − u, t + u[/{t}, x(t) > x(s)} Γmin = { t ∈ I| ∃u > 0, ∀s ∈ ]t − u, t + u[/{t}, x(t) < x(s)} 7
  • 8. 2. The mean value m(t) = (xsup (t) + xinf (t))/2 of the envelope defined by the local maxima xsup (t) and the envelope defined by the local minima xinf (t) is zero: ∃xsup ∈ Env(Γmax ), ∃xinf ∈ Env(Γmin ), ∀t ∈ I, m(t) = 0 with Env(Γmax ) = {f ∈ C(I) |∀t ∈ Γmax , f (t) = x(t)} Env(Γmin ) = {f ∈ C(I) |∀t ∈ Γmin , f (t) = x(t)} An IMF represents a simple oscillatory mode as a counterpart to the simple harmonic function, but it is much more general: instead of constant amplitude and frequency, as in a simple harmonic component, the IMF can have a variable amplitude and frequency as functions of time. The first condition is apparently necessary for oscillation data; the second condition requires that upper and lower envelopes of IMF are symmetric with respect to the x-axis. The idea of the EMD method is to separate the data into a slow varying local mean part and a fast varying symmetric oscillation part, the oscillation part becomes the IMF and the local mean the residue, the residue serves as input data again for further decomposition, the process repeats until no more oscillation can be separated from the residue of frequency mode. On each step of the decomposition, since the upper and lower envelope of the IMF are unknown initially, a repetitive sifting process is applied to approximate the envelopes with cubic spline functions passing through the extrema of the IMF. The data serves as the initial input for the IMF sifting process, and the refined IMF is the difference between the previous version and mean of the envelopes, the process repeats until the predefined stop condition is statisfied. The residue is then the difference between the data and the improved IMF. One big advantage of this procedure is that it can deal with data from nonstationary and nonlinear processes. This method is direct, and adaptive, with a posteriori-defined basis, from the decomposition method, based on and derived from the data. The intrinsic mode components can be decomposed in the following steps : 1. Take an arbitrary input signal x(t) and initialize the residual: r0 (t) = x(t), i = 1 2. Extract the ith IMF 3. Initialize the ”proto-Imf” h0 with h0 (t) = ri (t), k = 1 4. Extract the local maxima and minima of the ”proto-Imf” hk−1 (t) 5. Interpolate the local maxima and the local minima by a cubic spline to form upper and lower envelopes of hk−1 (t) 6. Calculate the mean mk−1 (t) of the upper and lower envelopes of hk−1 (t) 7. Define: hk (t) = hk−1 (t) − mk−1 (t) 8. If IMF criteria are satisfied, then set IMFi (t) = hk (t) else go to (4) with k = k + 1 9. Define: ri (t) = ri−1 (t) − IM Fi (t) 10. If ri (t) still has at least two extrema, then go to (2) with i = i+1; else the decomposition is completed and ri (t) is the ”residue” of x(t). 8
  • 9. Figure 1.1: Sifting process of the empirical mode decomposition: (a) an arbitrary input; (b) identified maxima (diamonds) and minima (circles) superimposed on the input; (c) upper envelope and lower envelope (thin solid lines) and their mean (dashed line); (d) prototype intrinsic mode function (IMF) (the difference between the bold solid line and the dashed line in Figure 2c) that is to be refined; (e) upper envelope and lower envelope (thin solid lines) and their mean (dashed line) of a refined IMF; and (f) remainder after an IMF is subtracted from the input. Once a signal has been fully decomposed, the signal x(t) can be written as N x(t) = IM Fi (t) + r(t) i=1 1.2 Closed form formulas for IMFs Rather than a Fourier or wavelet based transform, the Hilbert transform was used, in order to compute instantaneous frequencies and amplitudes and describe the signal more locally. Equation 3.1 displays the Hilbert transform Yt , which can be written for any function x(t) of Lp class. The PV denotes Cauchy’s principle value integral.  +∞   t−ε +∞  1 IM Fs  IM Fs IM Fs  Yt = H [IM Ft ] = PV  ds = lim  ds + ds π t−s ε→0 t−s t−s −∞ −∞ t+ε 9
  • 10. Algorithm 1 Empirical Mode Decomposition Require: Signal, threshold ∈ R+ ; 1: curSignal ← Signal, i = 1; 2: while (numberOfExtrema(curSignal) > 2) do 3: curImf ← curSignal 4: while (isNotAnImf(curImf , threshold) = true) do 5: Γmax ←emdGetMaxs(curImf ); 6: Γmin ←emdGetMins(curImf ); 7: Γmax ←emdMaxExtrapolate(curImf , Γmax ); 8: Γmin ←emdMinExtrapolate(curImf , Γmin ); 9: xinf ←emdInterpolate(curImf , Γmax ); 10: xsup ← emdInterpolate(curImf , Γmin ); 11: bias ← (xinf + xinf )/2; 12: curImf ← curImf − bias; 13: end while 14: IMFi ← curImf , i = i + 1; 15: curSignal ← curSignal − IMFi ; 16: end while 17: N = i; 18: residual ← curSignal 19: return (IM Fi )i=1..N An analytic function can be formed with the Hilbert transform pair as shown in equation 1.1 Zt = IM Ft + iYt = At eiθt where At = IM Ft2 + Yt2 Yt θt = arctan IM Ft At and θt are the instantaneous amplitudes and phase functions, respectively The instantaneous frequency ft can then be written as the time derivative of the phase, as shown in equation 1 dθt ft = 2π dt Hence, an IMF can be expressed analytically : t IM Ft = At cos 2π fs ds + ψ (1.1) 0 [11] and [14] showed that not all functions give ”good” Hilbert transforms, meaning those which produce physical instantaneous frequencies. The signals which can be analyzed using the Hilbert transform must be restricted so that their calculated instantaneous frequency functions have physical meaning. 10
  • 11. Next, the empirical mode decomposition is essentially an algorithm which decomposes nearly any signal into a finite set of functions which have ”good” Hilbert transforms that produce physically meaningful instantaneous frequencies. After IMFs have been obtained from the EMD method, one can further calculate instanta- neous phases of IMFs by applying the Hilbert Huang tranform to each IMF component. 11
  • 12. Chapter 2 State of the Art 2.1 Application fields of the EMD The Empirical Mode Decomposition can be a powerful tool to separate non-linear and non-stationary time series into the trend (residue function) and the oscillation (IMF) on different time scales, it can describe the frequency components locally adaptively for nearly any oscillating signal. This makes the tool extremely versatile. This decomposition find applications in many fields where traditionally Fourier analysis method or Wavelet method dominate. For instance, HHT has been used to study a wide variety of data including rainfall, earthquakes, Sunspot number variation, heart-rate variability, financial time series, and ocean waves to name a few subjects. But there are still some remaining mathematical issues related to this decomposition which have been mostly left untreated: convergence of the method, optimization problems (the best IMF selection and uniqueness of the decomposition), spline problems (best spline functions for the HHT). In the following chapters, these inexactitudes will be thoroughly developed, and the current potential solutions from the literature will be gathered. 2.2 Existence and uniqueness of the Decomposition The convergence of the proto-IMF (hk )k 0 sequence to an IMF is equivalent to the conver- gence of (mk )k 0 the bias to zero. L2 mk − − 0 −→ k→∞ where mk = hk−1 (t) − hk (t) 12
  • 13. 2.2.1 Stoppage criteria The inner loop should be ended when the result of the sifting process meets the definition of an IMF. In practice this condition is too strong so we need to specify a relaxed condition which can be met in a finite number of iterations. The approximate local envelope symmetry condition in the sifting process is called the stoppage (of sifting) criterion. In the past, several different types of stoppage criteria were adopted: the most widely used type, which originated from Huang et al. [14], is given by a Cauchy type of convergence test, the normalized Squared Difference between two successive sifting operations defined as T |hk−1 (t) − hk (t)|2 t=0 SDk = T h2 (t) k−1 t=0 must be smaller than a predetermined value. This definition is slightly different from the one given by Huang et al. [14] with the summation signs operating for the numerator and denominator separately in order to prevent the SDk from becoming too dependent on local small amplitude values of the sifting time series. If we assume that the local mean between the upper and lower envelopes converges to zero in sense of the euclidean norm, we can apply the following Cauchy criterion : 2 mk−1 L2 log 2 threshold hk−1 L2 In our implementation the threshold has been calibrated to -15. These Cauchy types of stoppage criteria are seemingly rigorous mathematically. However, it is difficult to implement this criterion for the following reasons: First, how small is small enough begs an answer. Second, this criterion does not depend on the definition of the IMFs for the squared difference might be small, but there is no guarantee that the function will have the same numbers of zero crossings and extrema. 2.2.2 Cubic spline interpolation Since the EMD is an empirical algorithm and involves a prescribed stoppage criterion to carry out the sifting moves, we have to know the degree of sensitivity in the decomposition of an input to the sifting process, so the reliability of a particular decomposition can further be determined. Therefore, a confidence limit of the EMD is a desirable quantity. To compute the upper and lower envelopes we use a piecewise-polynomial approximation. In general, the goal of a spline interpolation is to create a function which achieves the best possible approximation to a given data set. For a smooth and efficient approximation, one has to choose high order polynomials. A popular choice is the piecewise cubic approximation function of order three. 13
  • 14. The basic idea behind using a cubic spline is to fit a piecewise function of the form :    S1 (x), x ∈ [x1 , x2 [ S2 (x), x ∈ [x2 , x3 [  S(x) =   ... Sn−1 (x), x ∈ [xn−1 , xn [  where Si (x) is a third degree polynomial with coefficients ai , bi , ci and di defined for i = 0, 1, ..., n − 1 by : Si (x) = ai + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 1{x∈[xi ,xi+1 [} More formally, given a function f (x) defined on an interval [a, b] and a set of nodes a = x0 < x1 < ... < xn = b, a cubic spline interpolant S(x) for f (x) is a function that satisfies the following conditions : n−1 1. S(x) = Si (x)1{x∈[xi ,xi+1 [} is a cubic polynomial denoted by Si (x), on the subinterval i=0 [xi , xi+1 ) for each i = 0, 1, ..., n − 1. 2. Si+1 (xi+1 ) = Si (xi+1 ) for each i = 0, 1, ..., n − 2. 3. S i+1 (xi+1 ) = S i (xi+1 ) for each i = 0, 1, ..., n − 2. 4. S i+1 (xi+1 ) = S i (xi+1 ) for each i = 0, 1, ..., n − 2. 5. and one of the following set of boundary conditions is also satisfied : S (x0 ) = S (xn ) = 0 (free or natural boundary) S (x0 ) = f (x0 ) and S (xn ) = f (xn ) (clamped boundary). But there are four problems with this decomposition method : – the spline (cubic) connecting extrema is not the real envelope, – the resulting IMF function does not strictly guarantee the symmetric envelopes, – some unwanted overshoot may be caused by the spline interpolation, – the spline cannot be connected at both ends of the data series. Higher order spline does not in theory resolve these problems. 2.2.3 Additional boundary data points It has been already illustrated that the cubic spline has somehow to be kept close to the function especially near both ends of the data series. Therefore, the creation of additional boundary data points, which are supposed to be applicable to the current data set, appears to be the key element in technically improving the EMD. All artificially added boundary data points are generated from within the original set of discrete knots to represent a characteristic natural behaviour. One routine is to add new maxima and minima to the front and rear of the data series. As a basic requirement, these data points are located off the original time span the signal was recorded in. Therefore, no information is cancelled out and the natural data series remains unaffected. However one disadvantage of this method is that we anticipate the future trend of the data. 14
  • 15. Figure 2.1: Additional boundary data points Figure 2.2: Additional boundary data points 15
  • 16. Chapter 3 Application to the prediction of financial time series 3.1 The Empirical Mode Decomposition in finance: stylized facts In the markets, one can assume that different modes of oscillations in stock prices are provoked by different kinds of actors. With this approach, the lowest frequency IMFs could be considered as a statistical proxy to infer the behavior of big investors (Banks, Insurance companies, Hedge Funds, Mutual Funds, Pension Funds..) and predict the evolution of a stock in the long run. 3.1.1 Empirical Modes and Market Structure 3.1.1.1 Asset price and IMFs With the EMD, a time series can be represented as follows: NIM F ∗ ∃NIM F ∈ N |∀t ∈ Z, Xt = IM Fti + rt i=1 It can be interesting to observe the correlation matrix of the random vector   Xt  IM Ft1      ...    IM Ft N  rt t∈Z This matrix can be empirically computed on an example, as shown in Figure 3.1, page 17. 16
  • 17. This correlation matrix shows three stylized facts: X IMF1 IMF2 IMF3 IMF4 IMF5 IMF6 IMF7 Trend X 1.000 0.038 0.039 0.172 0.184 0.169 0.488 0.795 0.879 IMF1 0.038 1.000 0.034 -0.009 -0.026 0.002 0.022 -0.021 -0.019 IMF2 0.039 0.034 1.000 -0.040 -0.005 0.043 -0.015 -0.022 -0.025 IMF3 0.172 -0.009 -0.040 1.000 0.109 -0.114 0.102 0.049 0.059 IMF4 0.184 -0.026 -0.005 0.109 1.000 -0.035 0.018 0.035 0.040 IMF5 0.169 0.002 0.043 -0.114 -0.035 1.000 -0.096 -0.014 -0.038 IMF6 0.488 0.022 -0.015 0.102 0.018 -0.096 1.000 -0.043 0.148 IMF7 0.795 -0.021 -0.022 0.049 0.035 -0.014 -0.043 1.000 0.977 Trend 0.879 -0.019 -0.025 0.059 0.040 -0.038 0.148 0.977 1.000 Figure 3.1: Empirical correlation matrix of AXA share price and its IMFs, from 01/01/2003 to 14/11/2011 – The EMD displays some empirical orthogonal features. Hence, the following theoretical assumption can be made: ∀(i, j) ∈ [[1; NIM F ]]2 , i = j ⇒ IM Fi , IM Fj = 0 With the usual following scalar product: : L2 (R) × L2 (R) → R (X, Y ) → Cov(X, Y ) – Low Frequency IMFs display strong correlations to the original price series. – High Frequency IMFs are uncorrelated to the original price series. 3.1.1.2 High frequency modes Empirical modes have a strong connection to the market structure. On top of being uncorrelated to the general time series, high frequency IMFs present strong correlation to daily price movements, as shown by the following empirical correlation matrix of daily yields to the IMFs processes and the trend process, (see Figure 3.3, page 18). High frequency IMFs are accurately following daily movements, as shown in 3.4, page 19. Moreover, there appears to be local jumps in amplitude of the high frequency IMFs when daily changes are becoming sharper. This comes along local jumps in volatility. Hence, the amplitude of high frequency IMFs is probably positively correlated to the short term implicit volatility of the At-The-Money options on the stock. As we see on the last graph, amplitude jumps along with volatility, as daily yields exceed 10% (see Figure 3.6 and Figure 3.5). Finally, despite their significant short term periodicity, they display some signs of stationar- ity, as shown in Figure 3.2, page 18. 17
  • 18. Figure 3.2: Sample Autorrelation Function of the highest frequency IMF diff(Axa) Axa IMF1 IMF2 IMF3 IMF4 IMF5 IMF6 IMF7 Trend diff(Axa) Axa 1.000 0.552 0.087 0.029 0.016 -0.007 -0.008 -0.004 -0.003 IMF1 0.552 1.000 0.034 -0.009 -0.026 0.002 0.022 -0.021 -0.019 IMF2 0.087 0.034 1.000 -0.040 -0.005 0.043 -0.015 -0.022 -0.025 IMF3 0.029 -0.009 -0.040 1.000 0.109 -0.114 0.102 0.049 0.059 IMF4 0.016 -0.026 -0.005 0.109 1.000 -0.035 0.018 0.035 0.040 IMF5 -0.007 0.002 0.043 -0.114 -0.035 1.000 -0.096 -0.014 -0.038 IMF6 -0.008 0.022 -0.015 0.102 0.018 -0.096 1.000 -0.043 0.148 IMF7 -0.004 -0.021 -0.022 0.049 0.035 -0.014 -0.043 1.000 0.977 Trend -0.003 -0.019 -0.025 0.059 0.040 -0.038 0.148 0.977 1.000 Figure 3.3: Empirical correlation of daily returns of AXA and its IMFs , from 01/01/2003 to 14/11/2011 A Ljung Box Test is a more quantitative way to assess stationarity. It tests the hypothesis of being a second order stationary process. Definition 3.1.1 Let m ∈ N. (H0 ) : ρ1 = ρ2 = .. = ρm = 0. The Ljung-Box Test statistic is given by: m ρ2 h Q(m) = N (N + 2) h=1 N −h QH0 (m) ∼ χ2 m However, the test rejects the absence of autocorrelations for the highest frequency IMF on our previous example of AXA. Therefore, it suggests that these high frequency modes display 18
  • 19. Figure 3.4: Daily returns of Societe Generale and its highest frequency IMF Figure 3.5: Daily returns of Societe Generale and its highest frequency IMF during a high volatility period 19
  • 20. Figure 3.6: Daily returns of Societe Generale and its highest frequency IMF normalized some stationary properties, but keep a very short periodicity, making these processes not i.i.d. 3.1.1.3 Low frequency modes Low Frequency modes describe the long term dynamics of the stock. It reflects the positions of long term actors within the market,(see Figure 3.7, page 21). It can also be interpreted in terms of economic cycles, if applied to very long time frames, (see Figure 3.8, page 21). 3.1.2 Back to the Box & Jenkins framework The EMD algorithm derives the following decomposition of a given financial time series: NIM F ∗ ∃NIM F ∈ N |∀t ∈ Z, Xt = IM Fti + rt i=1 Moreover, low frequency IMFs explain the general evolution of the stock price, and have strong periodic patterns, whereas high frequency IMFs are linked to the daily movements. Figure 3.9, page 22 shows all the IMFs of an example of financial time series. In red are low frequency IMFs, in blue high frequency IMFs, in green the stock price. 20
  • 21. Figure 3.7: Societe Generale stock price and the sum of its 2 lowest frequency IMFs Figure 3.8: VIX and the sum of its 3 lowest frequency IMFs Hence, we can separate the previous sum into the two components of the Box & Jenkins approach. 21
  • 22. Figure 3.9: Empirical Mode Decomposition of Axa, starting 01/01/2003 until 14/11/2011 Nsep NIM F ∗ ∃NIM F ∈ N |∀t ∈ Z, Xt = IM Fti + IM Fti + rt i=1 i=Nsep +1 T rend Random part Seasonal part In the rest of this report, we will sometimes include the trend process in the seasonal part. In the previous example, the decomposition is the following: Remark 3.1.2 In the Figure 3.10, page 23 , the correlation between the ”Random Part” and the ”Seasonal Part” is: Corr(Xtrandom (t), Xtseasonal (t)) = −0.01 In order to properly differentiate low frequency IMFs and high frequency IMFs, one needs a rule. Multiple choices are possible: – A stationarity criterion for high frequency IMFs: statistical tests, such as Ljung Box Test, Runs Test, KPSS Test.. – A periodicity criterion for low frequency IMFs: low frequency IMFs must display less than p pseudo periods within the time interval. Beyond that threshold p, they are considered as moving too quickly, and not carrying information relative to the general evolution of the series. Provided that the goal of this decomposition is to extract a seasonal pattern reflecting the broad evolution of the stock price, a selection criterion for low frequency IMFs can seem more appropriate. It is also more intuitive than statistical tests. Therefore, the criterion chosen is the following: 22
  • 23. Figure 3.10: Box& Jenkins Decomposition of Axastock price based on EMD, starting 01/01/2003 until 14/11/2011 [ 1;T ] ∀i ∈ [[1; NIM F ]] , IM Fti 1 t T ∈ Xtseasonal 1 t T ⇔ #Γ0,i 3 where [ 1;T ] # Γ0,i = s ∈ [[1; T ]] |IM Fsi = 0 Therefore, we can now expressly write the Box & Jenkins decomposition based on the EMD: NIM F NIM F ∗ ∃NIM F ∈ N |∀t ∈ [[1; T ]] , Xt = IM Fti .1 #Γ[1;T ] >3 + IM Fti .1 [ 1;T ] #Γ0,i 3 + rt 0,i i=1 i=1 T rend Random part Seasonal part Remark 3.1.3 In the Figure 3.10, page 23, the ”Seasonal Part” achieves to explain 91% of the variance of the original time series: V ar(Xtseasonal + rt ) R2 := = 0.91 V ar(Xt ) 3.1.3 Prediction hypotheses We have now decomposed our signal in two parts: the seasonal component, and the random high frequency component. These two components are moreover uncorrelated, and the variance of the seasonal process explains most of the variance of the original time series. 23
  • 24. We have the following decomposition: ∀T ∈ N,∃NIM F ∈ N∗ |∀t ∈ [[0; T ]] , N NIM F Xt = IM Fti .1 #Γ[1;T ] >3 + IM Fti .1 [ 1;T ] #Γ0,i 3 + rt 0,i i=1 i=Nsep +1 T rend Random part Seasonal part We can now proceed to separate estimations for each process. As we noticed on the earlier example, the Random Part Process is approximately centered on zero. Therefore, we will make a simple prediction for this process: random E Xs T +1 s 2.T | Xtrandom 1 t T = (0)T +1 s 2.T We now have to formulate a prediction of the seasonal process. Following the framework of Box & Jenkins, two hypotheses are possible, in order to formulate predictions. 3.1.3.1 Deterministic periodicity of low frequency IMFs The first possible assumption is that the seasonal component is deterministic. Hence, we assume that in the future, this periodic component will keep its properties: Ti i – Periodicity: ∀i ∈ [[i0 ; NIM F ]] , ∀t ∈ Z, IM Ft+j = 0, or ∀i ∈ [[i0 ; NIM F ]] , ∀t ∈ j=1 Z, IM Ft+Ti = IM Fti i [t;t+T ] [t;t+T ] [t;t+Ti ] – IMF structure: ∀i ∈ [[i0 ; NIM F ]] , ∀t ∈ Z, #Γmax,i i + #Γmin,i i − #Γ0,i 1 Where [t;t+Ti ] Γ0,i = s ∈ [t; t + Ti ] |IM Fsi = 0 [t;t+T ] i Γmin,i i = s ∈ [t; t + Ti ] |∃u > 0, s = arg min IM Fv v∈[s−u,s+u] [t;t+T ] i Γmin,i i = s ∈ [t; t + Ti ] |∃u > 0, s = arg max IM Fv v∈[s−u,s+u] Hence, with these properties, each low frequency IMF can be easily prolonged, hence the estimated future seasonal process. 3.1.3.2 Stochastic periodicity of low frequency IMFs The second possible assumption is less strong that the first one. Instead of assuming that the seasonal component is deterministic, it is now assumed that it presents some periodicity, while remaining stochastic. 24
  • 25. seasonal E Xs + rt T +1 s 2.T | Xtseasonal + rt 1 t T seasonal + r ) = (Xs t T +1 s 2.T if we assume that future estimations should rely on sequences from the past of the same duration. 3.2 Insights of potential market predictors 3.2.1 Deterministic periodicity: Low frequency Mean Reverting Strategy This strategy relies on the hypothesis of deterministic periodicity of the seasonal process. To formulate a prediction at a certain time t, within the horizon T, this strategy relies on the following algorithm: random seasonal ∀s ∈ [[t − T ; t]] , Xs = Xs + Xs seasonal t seasonal Xs seasonal 1 seasonal ∀s ∈ [[t − T ; t]] , Xs = log seasonal where X = Xs X T + 1 s=t−T if Xtseasonal − 2Xt−1 seasonal seasonal + Xt−2 > threshold seasonal Xt+T then > 1 and αM ean Re vertingStrat (t) = 1 Xtseasonal elseif Xtseasonal − 2Xt−1 seasonal seasonal + Xt−2 < −threshold , Xt+T then < 1 and αM ean Re vertingStrat (t) = −1 Xt 3.2.2 Conditional expectation: Low Frequency Multi Asset Shift- ing Pattern Recognition Strategy and Mono Asset IMF Pat- tern Recognition Strategy 3.2.2.1 Low Frequency Multi Asset Shifting Pattern Recognition Strategy This strategy relies on the hypothesis of stochastic periodicity of the seasonal process. It considers a pool of N assets, among which is the asset which is to be predicted: i0 .To formulate a prediction at a certain time t0, within the horizon T, this strategy relies on the following algorithm: First, each price process is decomposed. 25
  • 26. i random,i seasonal,i ∀i ∈ [[1; Nassets ]] , ∀s ∈ [[0; t0 ]] , Xs = Xs + Xs And the asset of interest too, since it belongs to the pool of assets. i random,i0 seasonal,i0 ∀s ∈ [[0; t0 ]] , Xs0 = Xs + Xs Then, the three best fitting patterns are chosen: {(i1 , t1 ) , (i2 , t2 ) , (i3 , t3 )} 2 seasonal,i seasonal,i0 Xs Xs = arg min log seasonal,i − log seasonal,i0 (i,u)∈[ 1;Nassets ] ×[ 1;t0 −T ] [ [ X [ u;u+T ] u s u+T X [ t0 −T ;t0 ] t0 −T s t0 0 t seasonal,i0 1 where X [ t0 −T ;t0 ] = X seasonal,i0 , T + 1 s=t −T s 0 u+T seasonal,i 1 seasonal,i and X [ u;u+T ] = Xs T +1 s=u 1{X i1 >X i1 } + 1{X i2 >X i2 } + 1{X i3 >X i3 } t1 +T t1 t2 +T t2 t3 +T t3 Let Z = 3 Z is the decision variable. Predictions are made depending on the vote of the three best fitted scenarios. 1 Xt0 +T if Z> ,then > 1 and αShif tingP atternStrat (t) = 1 2 Xt0 1 Xt0 +T else if Z< ,then < 1 and αShif tingP atternStrat (t) = −1 2 Xt0 3.2.2.2 Low Frequency Mono Asset IMF Pattern Recognition Strategy This strategy relies on the hypothesis of stochastic periodicity of the seasonal process. It is very similar to the previous strategy. Differences exist on two essential points: – It does not require any other time series than the historical prices of the asset that is to be predicted – It is adapted. 26
  • 27. To formulate a prediction at a certain time t0, within the horizon T, this strategy relies on the following algorithm: i random,i0 seasonal,i0 ∀s ∈ [[t0 − T ; t0 ]] , Xs0 = Xs + Xs 2 seasonal,i0 seasonal,i0 Xs Xs {t1 , t2 , t3 } = arg min log seasonal,i0 − log seasonal,i0 u∈[ 1;t0 −T ] [ X [ u;u+T ] u s u+T X [ t0 −T ;t0 ] t0 −T s t0 t0 u+T seasonal,i0 1 seasonal,i0 seasonal,i0 1 seasonal,i0 where X [ t0 −T ;t0 ] = Xs ,and X [ u;u+T ] = Xs T + 1 s=t T +1 0 −T s=u Hence, the decision variable is: 1{X i0 >X i0 } + 1{X i0 >X i0 } + 1{X i0 >X i0 } t1 +T t1 t2 +T t2 t3 +T t3 Z= 3 And the predictions computed by the strategy: 1 Xt0 +T if Z> , then > 1 and αAutoP atternStrat (t) = 1 2 Xt0 1 Xt0 +T elseif Z< , then < 1 and αAutoP atternStrat (t) = −1 2 Xt0 27
  • 28. Chapter 4 Strategies analysis 4.1 Portfolio management Definition 4.1.1 Let Pt , t ∈ [1; T ] denote the stochastic process of the spot price of an asset. 4.1.1 Trading strategy Definition 4.1.2 A trading strategy is represented as follows. At each time period, it provides an anticipation of the market: -1 is bearish (i.e the price will decline), +1 is bullish (i.e the price will rise). α : [0; T ] → {−1; 1} 4.1.2 Investment Horizon An investment duration Tinvest , in terms of business days, drives the predictions of a given strategy. It can range from 10 to 252 business days (from two weeks until a year). Therefore, portfolio management can be driven by mid term or long term earnings prospects. Tinvest ∈ [|50; 252|] These prospects drive the PnL of the strategy, as positions will be covered after Tinvest business days. Definition 4.1.3 ∀α ∈ {−1; 1}[|1;T |] , ∀t ∈ [1; T ] , (P(i+Tinvest )∧t − Pi ) P &LTinvest (t) = α(i) α Pi 1 i t−1 28
  • 29. 4.1.3 Starting time The beginning of the time series is not subject to predictions. It is kept as prerequisite information in order to compute the first predictions. Indeed, concerning the Patterns Fitting Strategies, one needs to have a few historical patterns available for fitting. Therefore, the predictions start at the time: tstart = 10.Tinvest Hence, the new PnL vector: ∀α ∈ {−1; 1}[|1;T |] , ∀t ∈ [tstart ; T ] , (P(i+Tinvest )∧t − Pi ) P &LTinvest (t) = α(i) α Xi tstart i t−1 4.1.4 Trading time span A trading time span δt defines the duration between two portfolio rebalances. It is defined in terms of business days. Every δt days, one trade is closed and another position is taken. By default, and assumed in the back tests, it is equal to a fifth of the investment duration. Therefore, for example for Tinvest = 252, there will be 5 rebalances per year, one every 50 business days. Hence, in this example, δt = 50. Tinvest δt = 5 Therefore, the PnL now becomes: ∀α ∈ {−1; 1}[|1;T |] , ∀t, (P(Tinvest +i.δt +tstart )∧t − Pi.δt +tstart ) P &Lαinvest ,δt (t) = T α(i.δt + tstart ) Xi.δt +tstart 0 i (T −tstart ) δt 4.1.5 Annualizing the PnL and reducing its variance The PnL computed so far still carries a time dependence. It needs to be annualized, with the following operation: ∀α ∈ {−1; 1}[|1;T |] , ∀t, (P(Tinvest +i.δt +tstart )∧t − Pi.δt +tstart ) 252 P &LTinvest ,δt (t) = α α(i.δt + tstart ) . Xi.δt +tstart Tinvest 0 i (T −tstart ) δt Moreover, it is valuable to manage reducing the volatility of returns of a given strategy. If it is not at the expense of its mean rate of return, it contributes to slightly improving the sharp ratio. Hence, usual stop losses and cash ins are implemented in the back tests. 29
  • 30. From this section, we know have a frame for deriving a PnL vector from a given strategy α ∈ {−1; 1}[|1;T |] , and a given a price process Pt , t ∈ [1; T ]. The latter remains now to be evaluated with objective criteria. 4.2 Underlying and target market Three potential trading strategies were identified. They have been tested on three different types of underlyings, for the following reasons: - Stocks: CAC40. The goal is to find recurrent seasonality patterns within a single stock, or between different stocks. This seasonality could be caused by important market shifts on the initiative of big players, such as pension funds, insurance companies, asset managers, or banks proceeding to portfolio rebalancing. - Implied Volatilities: VIX Index, VStoxx Index, VCAC, VDAX.. These indices are computed by a closed formula relying on implied volatilities of numeral options, hence reflecting the overall structure of the volatility smile. - Index Pairs: based on the following most liquid worldwide indexes: CAC, DAX, SX5E, SPX, NKY, UKX, IBOV, SMI, HSI. Index pairs provide trajectories which generally follow mean reverting processes, and have the advantage of being extremely liquid. - Commodities: WTI. Commodities are known for displaying seasonality features. Therefore, they may constitute interesting underlyings. 30
  • 31. Chapter 5 Results 5.1 Empirical choices Three strategies have been mentioned in this paper. However, in practical matters, tests have only targeted one strategy, for the following reasons: – The IMF Mean Convexity Reverting Strategy has been tested for a few examples. However, the calibration of the threshold has proven to be difficult. Even the seasonal process remains somehow unstable, in particular its last values, where the second order derivative is computed. random seasonal ∀s ∈ [[t − T ; t]] , Xs = Xs + Xs seasonal t seasonal Xs seasonal 1 ∀s ∈ [[t − T ; t]] , Xs = log where X = X seasonal X seasonal T + 1 s=t−T s if Xtseasonal − 2Xt−1 seasonal seasonal + Xt−2 > threshold seasonal Xt+T then > 1 and αM ean Re vertingStrat (t) = 1 Xtseasonal elseif Xtseasonal − 2Xt−1 seasonal seasonal + Xt−2 < −threshold , Xt+T then < 1 and αM ean Re vertingStrat (t) = −1 Xt – The IMF Multi Asset Shifting Pattern Recognition Strategy is very similar to the Mono Asset IMF Pattern Recognition Strategy. However, it is harder to implement because it is a Multi Asset Strategy: results will depend on how much data is utilized. Moreover, the multi asset strategy is unadapted, contrary to the mono asset strategy. – Due to the small computing capacity available during the project, tests have mainly been focused on the last strategy hereby developed, i.e the Mono Asset IMF Pattern Recognition Strategy. Indeed, the idea is to derive reliable results by testing it on a wide range of 31
  • 32. underlyings (which list has been provided earlier). That way, the law of large numbers will help provide reliable results. 5.2 Tables 5.2.1 Volatility: VIX Index First are shown the most promising results: on the VIX Index. 3 different investment horizons have been tested: 50 days, 150 days, and 250 days. The result for 50 days is the most reliable, for it relies on the highest amount of trades. Bullish signals also seem to be more performing. It is quite expected, considering the general behavior of implied volatility. As provided in table 5.1, page 32, the last back test, with 35% cash in, gives a 57% hit ratio, and a 0,40 annualized sharp ratio. Figure 5.1: Back test on the VIX Index, with a 50 days investment horizon and 35% cash in Moreover, the results are also encouraging for a longer investment horizon: 150 days. However, they rely on less trades, simply due to a longer horizon. Again, bullish signals are more powerful. As provided in table 5.2, page 33, the last back test, with 100% cash in, gives a 54% hit ratio, and a 0,54 annualized sharp ratio. 32
  • 33. Figure 5.2: Back test on the VIX Index, with a 150 days investment horizon and 100% cash in Finally, the results are now presented for 250 days. Again, they rely on less trades, simply due to a longer horizon. Again, bullish signals are more powerful. As provided in table 5.3, page 34, the last back test, with 150% cash in, gives a 66% hit ratio, and a 0,76 annualized sharp ratio. However, it only relies on 15 trades, from 2005 to 2011, where bullish signals obviously have proven quite effective. Therefore, more tests with longer data need to be pursued. 5.2.2 Volatility: VStoxx Index To confirm long term results for the VIX Index, similar tests have been pursued on the VStoxx Index, see table 5.4, page 35. For the 126 days horizon, results are also encouraging. Without any cash ins, and with all signals (bullish and bearish), a 56% hit ratio is achieved on 70 trades, and a 0,20 sharp ratio. 5.2.3 Volatility: other indices: aggregate performance Here are, table 5.5, page 36, the aggregated results for all the indices tested in the following pool. The amount of trades represented is around 1000 for the 150 days table, and 100 for 33
  • 34. Figure 5.3: Back test on the VIX Index, with a 250 days investment horizon and 150% cash in the 250 days table, dating from 2006 to 2011. Therefore, results are very reliable. It seems that the aggregate prediction power on volatility is uncertain. However, results remain encouraging for the VIX, i.e the most liquid index (via futures or ETFs) among the volatility indexes. 5.2.4 French stocks: CAC 40: Aggregate performance Aggregated results for the French stocks market are quite disappointing, see table 5.6, page 37. 5.2.5 Equities Indices and trading pairs: Aggregate performance Tests have also been pursued on the main worldwide equity indices, and their pairs. Aggregated results on approximately 2000 trades and 20 years historical prices show that this strategy does not have any prediction power on this asset class, see table 5.7, page 37. 34
  • 35. Figure 5.4: Back test on the VStoxx Index, with a 126 days investment horizon and without cash in 5.2.6 Commodities: West Texas Intermediate (WTI) Results for the West Texas Intermediate (WTI), are shown in table 5.8, page 38. 35
  • 36. Figure 5.5: Back test on a pool of volatity indexes 36
  • 37. Figure 5.6: Back test on French stocks from the CAC 40 Figure 5.7: Back test on a pool of Equities Indexes 37
  • 38. Figure 5.8: Back test on the West Texas Intermediate (WTI) oil price 38
  • 39. Conclusion and outlook HHT offers a potentially viable method for nonlinear and nonstationary data analysis. But in all the cases studied, HHT does not give sharper results than most of the traditional time series analysis methods. In order to make the method more robust, rigorous, in application, an analytic mathematical foundation is needed. In our view, the most likely solutions to the problems associated with HHT can only be formulated in terms of optimization: the selection of spline, the extension of the end, etc. This may be the area of future research. While this study tries to design some theoretical ground for the HHT, further theoretical work is greatly needed in this direction. On the empirical aspect, more research also needs to be pursued. While not particularly performing on stock prices, the EMD seems more adapted to curves resembling implied volatilities, and more able to derive meaningful dynamics off them. Strong results have been reached concerning main volatility indexes, such as VIX or VStoxx. Therefore, further empirical tests on this asset class could be rewarding. Moreover, a great variety of assets has not been tested for prediction: other types of commodities (only WTI was tested), precious metals, fixed income assets such as sovereign or corporate bonds.. Finally, significant tests have only been pursued for one strategy among the three that were formulated. The code provided with this report is able to generate results for the two other strategies, and can be the base of wider back tests on industrial scale. In terms of applications, this study has limited itself to the first part of the HHT algorithm, i.e the Empirical Mode Decomposition. Maybe further work can be done in order to properly formalize the Hilbert spectrum, make new hypotheses, and derive potential predictors using the same methodology as in this study. Acknowledgements This study has been pursued in collaboration with the Equity Quantitative Team at Natixis. Since our first arrival in the locals of Natixis, we have been thoroughly assisted. Successful 39
  • 40. professionals were kind enough to answer our questions and to give their opinion on our work during the entire year. Without their advices, this study would not have achieved its current findings. Our project consisted of working at Natixis every Wednesday, from October 011 to March 2012. Workdays were a great opportunity to work within the finance environment, and to learn about the role of quantitative associates within the banking industry. First, we would like to thank our supervisor Mr Adil Reghai, Head of Quantitative Analytics at Natixis. Adil showed much interest for our project, shared our views, and gave us valuable feedbacks during the whole year. He helped us design our predictors, and constantly gave us new ideas for back tests. We would also like to thank Mr Adel Ben Haj Yedder, who greatly contributed to our project, proofread our reports, and gave us feedbacks. We also had the opportunity to discuss with Adel about his daily job, the role of the team, and about the banking industry in general. His views will be valuable in order for us to precise our professional project and goals. We are also thankful to Stephanie Mielnik, Thomas Combarel for their contributions, and to the team in general. Moreover, this study was pursued in collaboration with Dr Alex Langnau, Global Head of Quantitative Analytics at Allianz. Alex is a consultant for Natixis and academics at the University of Munich, and talked Adil about the Hilbert Huang Transform. During our project, Alex also gave us valuable feedbacks, in particular about the portfolio management of our trading strategies. Also, we would like to thank our teachers at Ecole Centrale Paris, from the Applied Mathematics Department. Mr Erick Herbin, professor of stochastic processes, supervised our project. He encouraged us to formalize the Hilbert Huang Algorithm. Despite being a difficult task, it has revealed to be essential. We are also thankful to Mr Gilles Fa¨, professor of statistics and time series, for his lectures, providing important theoretical y grounds for our study. Finally, we wish to thank our colleagues from the Applied Mathematics Program who pursued other projects in collaboration with Natixis. We have been working with them since October, and we enjoyed having breaks with them. To name them: Marguerite de Mailard, Lucas Mahieux, Nicolas Pai and Victor Gerard. 40
  • 41. Bibliography [1] Barnhart, B. L., The Hilbert-Huang Transform: theory, applications, development, dissertation, University of Iowa, (2011) [2] Brockwell, P.J., and Davis, R.A.,Introduction to Time Series and Forecasting, second edition , Springer-Verlag, New York. (2002) [3] Cohen, L., Generalized phase space distribution functions, J. Math. Phys. 7, 781 (1966) [4] Datig, M., Schlurmann, T., Performance and limitations of the Hilbert-Huang trans- formation (HHT) with an application to irregular water waves, Ocean Engineering, 31, 1783-1834, (2004) [5] De Boor, C., A Practical Guide to Splines, Revised Edition, Springer- Verlag. (2001) [6] Dos Passos, W.,( Numerical methods, algorithms, and tools in C# ), CRC Press, (2010) [7] Fa¨, G., S´ries Chronologiques, Lecture notes, Ecole Centrale Paris, (2012) y e [8] Flandrin, P., Goncalves, P., Rilling, G., EMD Equivalent Filter Banks, from Interpre- tation to Applications, in : Hilbert-Huang Transform and Its Applications (N.E. Huang and S.S.P. Shen, eds.), pp. 57 -74. (2005) [9] Golitschek, M., On the convergence of interpolating periodic spline functions of high degree, Numerische Mathematik, 19, 46-154, (1972) [10] Guhathakurta, K., Mukherjee, I., Chowdhury, A.R., Empirical mode decomposition analysis of two different financial time series and their comparison, Elsevier, Chaos Solitons and Fractals, 37, 1214-1227, (2008) [11] Holder, H.E., Bolch, A.M. and Avissar.,R., Using the Empirical Mode Decomposition (EMD) method to process turbulence data collected on board aircraft Submitted to J. Atmos. Ocean. Tech., (2009) [12] Hong, L., Decomposition and Forecast for Financial Time Series with High-frequency Based on Empirical Mode Decomposition, Elsevier, Energy Procedia, 5, 1333-1340, (2011) [13] Huang, N.E., Shen, S.S.P., Hilbert-Huang transform and its applications, Volume 5 of interdisciplinary mathematical sciences, (2005) [14] Huang, N.E., Shen, Z., Long, S., Wu, M., Shih, H., Zheng, Q., Yen, N., Tung, C. and Liu, H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis Proc. R. Soc.. 454, 1971, 903. (1998) 41
  • 42. [15] Huang, N. E., Wu, Z., A review on Hilbert-Huang transform: Method and its applications to geophysical studies, Rev. Geophys., 46, RG2006, (2008) [16] Liu, B. , Riemenschneider, S. ,Xu, Y., Gearbox, fault diagnosis using empirical mode decomposition and Hilbert spectrum, Mechanical Systems and Signal Processing, 20, 718- 734, (2006) [17] Pan, H., Intelligent Finance - General Principles, International Workshop on Intelligent Finance, Chengdu, China, (2007) [18] Reghai, A., Goyon, S., Messaoud, M., Anane, M., Market Predictor : Pr´diction e quantitative des tendances des march´s, Etude Strategie Quant Recherche Actions, e Natixis Securities, Paris (2010) [19] Reghai, A., Goyon, S., Combare, T., Ben Haj Yedder, A., Mielnik, S., Sharpe Select : optimisation de l’investissement Cross Asset, Etude Strat´gie Quant Recherche e Quantitative, Natixis Securities, Paris (2011) 42
  • 43. Appendix A Time series Prerequisites A.1 Stationary and linear processes A.1.1 Stationarity Definition A.1.1 A time series is a stochastic process in discrete time, Xt ∈ R, t ∈ Z for example. Thus, a time series is composed of realizations of a single statistical variable during a certain time interval (for example a month, a trimester, a year, or a nanosecond ). We can expect to develop some interesting predictions if the process displays certain structural properties: – Either some ”rigidity”, allowing to extrapolate some deterministic parts. – Either some form of statistical invariance, called stationnarity, allowing learning the present and predicting the future, based on the past. Definition A.1.2 Xt ∈ R, t ∈ Z is said strictly stationary iff its finite dimensional distributions are invariant under any time translation, i.e: ∀τ ∈ Z,∀n ∈ N∗ , ∀(t1 , .., tn ) ∈ Zn , (Xt1 , .., Xtn ) ∼ (Xt1 −τ , .., Xtn −τ ) Definition A.1.3 Xt ∈ R, t ∈ Z is said to be stationary at the second order iff: – (Xt )t∈Z ∈ L2 (R), i.e ∀t ∈ Z, E [Xt2 ] < ∞ – ∀t ∈ Z, E [Xt ] = E [X0 ] := µX – ∀s, t ∈ Z, γX (t, s) := Cov (Xt , Xs ) = Cov (X0 , Xs−t ) =: γ (s − t) Definition A.1.4 The autocorrelation function of a stochastic process Xt ∈ R, t ∈ Z is the series Cov (Xt , Xs ) ρ(s, t) = 1 (V ar (Xs ) .V ar (Xt )) 2 43
  • 44. A.1.2 Linearity Within the family of stationary processes, an important family of processes is known as the linear processes. They are derived from white noise processes. Definition A.1.5 A stochastic process Xt ∈ R, t ∈ Z is a weak white noise iff it is stationary at the second order, and: µX = 0 ∀h ∈ Z, γX (h) = σ 2 .δ0 (h) Definition A.1.6 A stochastic process Xt ∈ R, t ∈ Z is a strong white noise iff it is i.i.d and µX = 0. Hence, second order linear processes can now be defined. Definition A.1.7 Xt ∈ R, t ∈ Z is a weak (resp. strong) second order linear process iff ∃(Zt )t∈Z , ∃(ψj )j∈Z , such as: (Zt ) weak (resp. strong) White Noise (σ 2 ) |ψj | < ∞ j∈Z ∀t ∈ Z, Xt = ψj Zt−j j∈Z Second order linear processes are well known and studied. It can appear as excessive to merely study these kinds of linear processes. However, the Wold’s decomposition provides a strong result for these processes: A.1.3 Wold’s decomposition: Every second order stationary process Xt ∈ R, t ∈ Z can be written as the sum of a second order linear process and a deterministic component. ∀t ∈ Z, Xt = ψj Zt−j + η(t) j∈Z where : (Zt ) weak (resp. strong) White Noise (σ 2 ) |ψj | < ∞ j∈Z η ∈ RZ Hence, basic linear processes, such as ARMA models, provide a strong base for explaining stationary processes. However, the latter assumption is quite reductive. 44
  • 45. A.2 The particular case of financial time series: para- metric and non parametric extensions A.2.1 Non-stationary and non linear financial time series Financial time series are known for displaying a few characteristic unknown to stationary or linear processes: – Their flat tales distribution are non compatible to Gaussian density functions. They must are more accurately fitted by power laws, i.e processes of infinite variance. These processes are not utilized in practice, because a measure of volatility (i.e variance) is paramount in finance ( For example, in order to price options, or to compute sharp rations of indices, stocks or strategies. See our definitions in chapter 4). – Non-linearity: they display non constant variance. Clusters of volatilities are common in financial time series. These clusters are incompatible with linear and stationary processes like ARMA (which have a constant variance). – Non-stationnarity: they have a long term memory. – - Time inversion: linear stationary processes are invariant to time changes. However, a financial time series obviously is coherent with only one time direction, and is not consistent if time is inversed. A.2.2 Parametric processes for financial time series To tackle the issue of non linearity, popular parametric models are ARCH(p) and GARCH (p,q). Definition A.2.1 Xt ∈ R, t ∈ Z is defined as an ARCH(p) process by: Xt = σt Zt p 2 2 σt = ψ0 + ψj Xt−j j=1 Where: ∀j ∈ [[1; p]] , ψj > 0 Zt iid(0, 1) Definition A.2.2 Xt ∈ R, t ∈ Z is defined as a GARCH (p,q) process by: Xt = σt Zt p 2 2 σt = ψ0 + ψj Xt−j j=1 45
  • 46. Where: ∀j ∈ [[1; p]] , ψj > 0 ∀j ∈ [[1; q]] , ϕj > 0 Zt iid(0, 1) A.2.3 Non parametric processes for financial time series Numerous non parametric methods have been and are being developed to fit financial data. The goal here is not to mention all of them. The Empirical Mode Decomposition lives within this environment. A.3 General time series: the Box & Jenkins approach for prediction Within the framework of Box and Jenkins (1970), a time series can be modeled as realization of simultaneous phenomena: – The first, ηt is a regular and smooth time evolution, called trend. – The second, St , is a periodic process of period T. – The third component, Wt , is the random component. It can be a stationary process. ∀t ∈ Z, Xt = Wt + St + ηt Definition A.3.1 St is a periodic process with period T, iff: ∀t ∈ Z, St+T = St T St = 0 t=1 : L2 (R) × L2 (R) → R (X, Y ) → Cov(X, Y ) From this framework, we will connect the EMD algorithm to the literature of time series. Some assumptions will be made to match this approach, and they will drive the predictions algorithms formulated later in this chapter. 46
  • 47. Appendix B Evaluation criteria of backtests In order to evaluate the efficiency of the potential highlighted strategies, a few rules need to be taken. As mentioned above, adaptability is the main one. Every back-test ought to be implemented as if the future were unknown. Therefore, it implies some obligations: - a time frame restriction: every prediction must be computed without using any feature of its future values. - a class restriction: predictions must be evaluated on an aggregate basis, for every kind of underlying. For example, no discrimination of the best or worst performers should be done, because it is another form of unadaptability. However, it remains pertinent to evaluate the performance of the strategies regarding the asset class to which they are applied. Hence, as it will be mentioned further on, some strategies might be more efficient on stocks, trading pairs, or implied volatilities. Within the asset management theory, a few variables suffice to quantify the efficacy of a trading strategy. They are usually noted as the following: Let X the random variable which denotes the annualized gain(or loss) of each trade of a trading strategy. Its realizations are written: Xi , i = 1..n Definition B.0.2 Average return = E [X] Average gain = E [X|X > 0] Average loss = E [X|X < 0] Drawdown = - min {Xi , i = 1..n} Hit ratio = P (X > 0) These values need to be analysed together. A hit ratio above 50% should be compared with the average gain and loss. The drawdown also provides valuable information on the risk of the trading strategy, and gives a hint of the sharp ratio. Drawdowns are a valuable tool in order to calibrate stop losses thresholds. 47
  • 48. Definition B.0.3 E [X − T ] Information ratio = V ar (X − T ) with T as a benchmark performance rate. In the results displayed in the annexes, the benchmark performance rate plugged in the computations of Information ratios is 0%. Definition B.0.4 E [X − T ] Sortino ratio = V ar (X|X < T )) with T as a benchmark performance rate. In our results displayed in the annexes, the benchmark performance rate in our computations of Sortino ratios is 0%. Therefore, this measurement only takes into account ”negative volatility”, i.e the volatility of losses. Definition B.0.5 E [X − rf ] Sharp ratio = V ar (X)) with rf as the annualized risk free rate The analyses hereby rely mainly on the information ratio and the Sortino ratio. However, at the light of the current risk free rates, the information ratio can be considered as a good approximation of the Sharp ratio. 48