Master's Thesis defence presentation

Return Interval Distribution of Extreme Events in Long
memory Time Series With Two Scaling Exponents
Smrati Kumar Katiyar
Department of Physics
IISER, Pune
May 3, 2011
Katiyar S K (IISER Pune) Thesis Presentation May 3, 2011 1 / 27

1 Explanation for the title
2 Statistical test for long memory
3 Foundation stone for our work
4 our work
Analytical approach
Numerical approach to the problem
Comparison of analytical and numerical results
5 Long memory probability process with two scaling exponents
6 conclusion
7 future direction

What are the key terms?
Return interval distribution of extreme events in long memory time series
with two scaling exponents.

1 Return interval and extreme events

2 Long memory time series

2 Long memory time series
3 Scaling exponents

Return interval and extreme events
Given a time series X(t)

Return interval and extreme events
Given a time series X(t)
0 20 40 60 80 100
t
-4
-3
-2
-1
0
1
2
3
x(t)
threshold
r1 r2 r3
Figure: Return intervals and extreme events

Aim of our project work
Example : let say we are given a time series X(t) and there are total 11
time instants at which the value of X is more than the threshold(q).
Those time instants are,
t = 0, 1, 3, 5, 6, 7, 10, 11, 12, 14, 16

t = 0, 1, 3, 5, 6, 7, 10, 11, 12, 14, 16
So the return intervals will be :
return intervals = 1, 2, 2, 1, 1, 3, 1, 1, 2, 2
out of these 10 return intervals we have
5 return intervals of length 1
and 1 return interval of length 3

t = 0, 1, 3, 5, 6, 7, 10, 11, 12, 14, 16
So the return intervals will be :
return intervals = 1, 2, 2, 1, 1, 3, 1, 1, 2, 2
out of these 10 return intervals we have
and 1 return interval of length 3
so the probability of occurance of return interval of length 1 will be
P(1) = 5
10,
similarly P(2) = 4
10 and P(3) = 1
10

Long memory time series

Plot of sample autocorrelation function (ACF) ρk against lag k is one of
the most useful tool to analyse a given time series.

ρk =
n
t=k+1(xt −x)(xt−k −x)
n
t=1(xt −x)2

ρk =
n
n
t=1(xt −x)2
For long memory processes
ρk → Cρk−γ
as k → ∞
where Cρ > 0 and γ ∈ (0, 1)

ρk =
n
n
t=1(xt −x)2
ρk → Cρk−γ
as k → ∞
0 10 20 30 40 50 60
0.00.20.40.60.81.0
Lag
ACF
Series a

ρk =
n
n
t=1(xt −x)2
ρk → Cρk−γ
as k → ∞
0 10 20 30 40 50 60
0.00.20.40.60.81.0
Lag
ACF
Series a
A Long memory process is trend reinforcing, which means the direction
(up or down compared to the last value) of the next value is more likely
the same as current value.

Scaling exponents

Scaling exponents
The most common power laws relate two variables and have the form
f (x) ∝ xα

Scaling exponents
The most common power laws relate two variables and have the form
f (x) ∝ xα
Here α is called the scaling exponent. where the word ”scaling” denotes
the fact that a power-law function satisﬁes
f (cx) = cαf (x) ∝ f (x)
Here c is a constant.

Statistical test for long memory
How to ﬁnd whether a given time series x(t) has long memory or not?

Detrended ﬂuctuation analysis

x(t) is the time series. (t = 1, 2, 3, .......Nmax )

y(k) = k
i=1(xi − x ) cumulative sum or proﬁle

y(k) = k
i=1(xi − x ) cumulative sum or profile
Divide y(k) into time window of length n samples. In each box of length
n, we fit y(k), using a polynomial function of order l, which represents the
trend in that box. The y coordinate of the fit line in each box is denoted
by yn(k). Since we use a polynomial fit of order l, we denote the algorithm
as DFA-l.
0 100 200 300 400 500 600 700 800 900 1000
0
50
100
150
200
250
300
350
K
Yk

The integrated signal y(k) is detrended by subtracting the local trend
yn(k) in each box of length n.
For a given box size n, the root-mean-square (rms) ﬂuctuation for this
integrated and detrended signal is calculated:
F(n) = 1
Nmax
Nmax
k=1 [y(k) − yn(k)]2

F(n) = 1
Nmax
Nmax
k=1 [y(k) − yn(k)]2
The above computation is repeated for a broad range of scales (box size
n) to provide a relationship between F(n) and the box size n.

F(n) = 1
Nmax
Nmax
k=1 [y(k) − yn(k)]2
The above computation is repeated for a broad range of scales (box size
n) to provide a relationship between F(n) and the box size n.
A power-law relation between the average root-meansquare ﬂuctuation
function F(n) and the box size n indicates the presence of scaling
F(n) ∝ nα

We can ﬁt the log-log plot with a straight line and the slope of that line
will be the scaling exponent.

0 0.2 0.4 0.6 0.8 1
ln n
0
0.2
0.4
0.6
0.8
1
lnF(n)
Figure: log-log plot of F(n) Vs n

0 0.2 0.4 0.6 0.8 1
ln n
0
0.2
0.4
0.6
0.8
1
lnF(n)
Figure: log-log plot of F(n) Vs n
When slope of line is in range (1/2,1) ,The time series displays long
memory.

Foundation stone for our work
for a long memory time series with one scaling exponent, the probability
distribution of return intervals are known (Santhanam et. al, 2008)

Foundation stone for our work
for a long memory time series with one scaling exponent, the probability
distribution of return intervals are known (Santhanam et. al, 2008)
P(R) = a R−(1−γ) e
−( a
γ
)Rγ
Here R is the scaled return interval
R = r
r ,where r are the actual return intervals r = 1, 2, 3, 4, ......

our work
What about time series with two scaling exponent?
How to calculate their return interval distributions?
Examples of these kind of time series are:

our work
high frequency ﬁnancial data,

our work
network traﬃc of a web server etc.

our work
0 0.2 0.4 0.6 0.8 1
ln n
0
0.2
0.4
0.6
0.8
1
lnF(n)

our work
0 0.2 0.4 0.6 0.8 1
ln n
0
0.2
0.4
0.6
0.8
1
lnF(n)
Figure: Podobnik et al. PHYSICA A,
2002

our approach to solve the problem
analytical approach

analytical approach We will consider a probability model for a stationary
process with long memory, given an extreme event at time t = 0, the
probability to ﬁnd an extreme event at time t = r is given by

analytical approach We will consider a probability model for a stationary
process with long memory, given an extreme event at time t = 0, the
probability to ﬁnd an extreme event at time t = r is given by
Pex (r) =
a1r−(2α1−1) = a1r−(1−γ1) for 0 < r < nx
a2r−(2α2−1) = a2r−(1−γ2) for nx < r < ∞
where 0.5 < α1, α2 < 1 are DFA exponents and 0 < γ1, γ2 < 1 are
autocorrelation exponents
nx is the crossover scale

After a very long algebra we ﬁnd the return interval distribution
P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞

P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞
How to ﬁnd three unknowns a1, a2 and C ?

P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞
How to ﬁnd three unknowns a1, a2 and C ? we need three equations........

P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞
normalization equation
∞
0
P(r)dr = 1.

P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞
∞
0
P(r)dr = 1.
normalizing r to unity
∞
0
rP(r)dr = 1

P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞
∞
0
P(r)dr = 1.
normalizing r to unity
∞
0
rP(r)dr = 1
using continuity condition
a1r−(1−γ1) = a2r−(1−γ2) at r = nx
a1n
−(1−γ1)
x = a2n
−(1−γ2)
x

ﬁnal equations for a1, a2 and C
a1n
−(1−γ1)
x = a2n
−(1−γ2)
x

a1n
−(1−γ1)
x = a2n
−(1−γ2)
x
Ce−(a2/γ2)n
γ2
x = e−(a1/γ1)n
γ1
x

a1n
−(1−γ1)
x = a2n
−(1−γ2)
x
Ce−(a2/γ2)n
γ2
x = e−(a1/γ1)n
γ1
x
C(γ2/a2)1/γ2
nx Eγ2−1
γ2
(n
γ2
x )
γ2
− (γ1/a1)1/γ1
nx Eγ1−1
γ1
(n
γ1
x )
γ1
= 1
Here En(x) =
∞
1
e−xt
tn dt =
1
0 e−x/ηη(n−2)dη
En(x) is known as exponential integral function.

numerical approach to the problem
ﬁrst and only challenge with this approach : to get a long memory time
series which contains two diﬀerent scaling exponents.

The model

The model
Step 1:
set the length of time series, say, l = 105.

The model
Step 1:
Step 2:
generate a series of random numbers yi i = 0......(l − 1) which follow
gaussian distribution with mean 0 and variance 1

The model
Step 1:
Step 2:
generate a series of random numbers yi i = 0......(l − 1) which follow
gaussian distribution with mean 0 and variance 1
Step 3:
generate a series of coeﬃcients deﬁned as:
Cα
i =
Γ(i − α)
Γ(−α)Γ(i + 1)
= −
α
Γ(1 − α)
Γ(i − α)
Γ(i + 1)
α =
α1 for 0 < i < nx
α2 for nx < i < ∞

Both α1 and α2 belong to the interval (−0.5, 0)
The asymptotic behaviour of Cα
i for large i can be written as
Cα
i ≃ −
α
Γ(1 − α)
i−(1+α)
for i ≫ 1

Both α1 and α2 belong to the interval (−0.5, 0)
The asymptotic behaviour of Cα
i for large i can be written as
Cα
i ≃ −
α
Γ(1 − α)
i−(1+α)
for i ≫ 1
Step 4:
Now, get a series yα
i using yi and Cα
i according to the relation
yα
i =
i
j=0
yi−jCα
j i = 0.....(l − 1) (1)

DFA of time series generated using previous model
0 1 2 3 4 5 6
log (n)
0
1
2
3
4
logF(n)
DFA analysis of time series
crossover region

comparison of analytical and numerical results
try to ﬁt P(r) = ar−(1−γ)e−(c/γ)rγ
to each segment, according to their
corresponding γ values
-5 -4 -3 -2 -1 0 1 2
ln (R)
-8
-7
-6
-5
-4
-3
lnP(R)
segment 1
segment2
break point
discrepancy because of threshold dependence of constants and long
memory in return intervals.

-4.8
-4.6
-4.4
-4.2
-4
-3.8
-3.6
-3.4
-4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1
lnP(R)
ln(R)
return interval distribution segment(1)
-8
-7.5
-7
-6.5
-6
-5.5
-5
-4.5
-0.5 0 0.5 1 1.5
lnP(R)
ln(R)
return interval distribution segment(2)

Long memory probability process with two scaling
exponents
To remove discrepancy because of long memory in return intervals. We
will generate return intervals such that they have no dependence on each
other.
ﬁrst determine the constants a1 and a2 by normalizing it in the region
kmin = 1 and kmax .
kmax
1
Pex (r)dr =
nx
1
a1r−(1−γ1)
dr +
kmax
nx
a2r−(1−γ2)
dr = 1
Use continuity condition as well and solve for a1 and a2.
a1 =
1
[n
γ1
x
γ1
− 1
γ1
+ k
γ2
max n
γ1−γ2
x
γ2
− n
γ1
x
γ2
]
a2 =
1
[n
γ2
x
γ1
− n
γ2−γ1
x
γ1
+ k
γ2
max
γ2
− n
γ2
x
γ2
]

Now generate a random number ξr from a uniform distribution at every r
and compare it with the value of Pex (r). A random number is accepted as
an extreme event if ξr < Pex (r) at any given value of r. If ξr ≥ Pex (r),
then it is not an extreme event. Using this procedure we can generate a
series of extreme events.
-6 -4 -2 0 2
ln (R)
-9
-8
-7
-6
-5
-4
-3
-2
lnP(R)
segment 2
segm
ent 1
break point

-6.5
-6
-5.5
-5
-4.5
-4
-3.5
-3
-2.5
-2
-5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5
lnP(R)
ln(R)
return interval distribution of segment 1
-8.5
-8
-7.5
-7
-6.5
-6
-1 -0.5 0 0.5 1 1.5 2
lnP(R)
ln(R)
return interval distribution of segment 2
P(r) =
a1r−(1−γ1)e−(a1/γ1)rγ1
for 0 < r < nx
Ca2r−(1−γ2)e−(a2/γ2)rγ2
for nx < r < ∞

In the previous slide, for segment(1) in place of a1, we have two variables a
and b. Why we have two diﬀerent variables? the possible reason is that
for normalization integrals we have lower limit as 0 but in reality the
minimum size of return interval is 1. So even after scaling, the minimum
value of lower limit is 1/ r .

conclusion
for a long memory time series with two diﬀerent scaling exponent

conclusion
There is a break point in the return interval distribution graph.

conclusion
For each scaling exponent there will be a diﬀerent segment in return
interval distribution.

conclusion
For each scaling exponent there will be a diﬀerent segment in return
interval distribution.
Each segment still follow a distribution of the form which is product of
power law and stretched exponential.

future direction

future direction
the model that we have used to generate time series with more than one
scaling exponent need a ﬁne tuning so that we can test our analytical
results more accurately

future direction
the model that we have used to generate time series with more than one
scaling exponent need a ﬁne tuning so that we can test our analytical
results more accurately
we should also think of the eﬀects of long memory in return intervals

Thank you
Questions????

Master's Thesis defence presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Master's Thesis defence presentation

Similar to Master's Thesis defence presentation (20)

Recently uploaded

Recently uploaded (20)

Master's Thesis defence presentation