• Save
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects of the regression quantiles methodology in the POT analysis
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects of the regression quantiles methodology in the POT analysis

on

  • 706 views

 

Statistics

Views

Total Views
706
Views on SlideShare
634
Embed Views
72

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 72

http://klimatext.tul.cz 72

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects of the regression quantiles methodology in the POT analysis Presentation Transcript

  • 1. 1 Statistical aspects of the regressionquantiles methodology in the POT analysis Jan Picek, Martin Schindler Technical University of Liberec, Czech Republic Department of Applied Mathematics Jan Kysel´, Romana Beranov´ y a Institute of Atmospheric Physics, Czech RepublicWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 2. Motivation 2MotivationDevelopment of extreme value models with time-dependent parameters inorder to estimate (time-dependent) high quantiles of maximum daily airtemperatures over Europe in climate change simulations (1961-2100).Kysel´, Picek, Beranova (2010): Global and Planetary Change, 72, 55-68 yWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 3. Data motivation 3Differences between 20-yr return values of TMAX estimated using thenon-stationary POT model for years 2100 and 2071. Large (small) crossesmark gridpoints in which the estimated 90% (80%) CIs do not overlap.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 4. Theoretical models 4Theoretical modelsFisher-Tippett Theorem: ”If suitable normalized maxima converge in dis-tribution to a non-degenerate limit, then the limit distribution must be anextreme value distribution.”=⇒ Method block maxima – we collect data on block maxima and fit thethree-parameter form of the GEV distribution. For this we require a lot ofraw data so that we can form sufficiently many, sufficiently large blocks.Threshold view – it is reasonable to involve all values exceeding a givenhigh threshold u. Pickands (1975) showed that the limiting distribution ofnormalized excesses of a threshold u as the threshold approaches the end-point uend of the variable of interest is the Generalized Pareto Distribution.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 5. Theoretical models 5It is usual to fit the Generalized Pareto Distribution to excesses over a(high enough) threshold. Thus we suppose that the asymptotic result is(approximately) true for the threshold of interest.The method is known as peaks-over-threshold (POT) and leads to thePoisson process model for threshold exceedances and the GeneralizedPareto (GP) distribution for their magnitudes.The block maxima and POT methods assume stationarity of the under-lying process which is often violated in climatology by the presence of atrend or long-term variability in the data.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 6. Theoretical models 6If we describe a variable of primary interest by using covariate information(time index, variables based on atmospheric circulation ...).=⇒An approach based on the theory of point processes developed by Smith(1989) and Coles (2001).The method leads to a likelihood function that can be treated in a usualway to obtain maximum likelihood estimates, standard errors and con-fidence intervals of the model parameters. One of its main advantagesis that it enables a straightforward incorporation of time-dependency ofparameters of the extreme value distribution.BUTalso the threshold may depend on the covariates.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 7. Theoretical models 7When a significant trend is present in the data, no fixed threshold in thePOT models is suitable over longer periods of time: there are either toofew (or no) exceedances over the threshold in an earlier part of records ortoo many exceedances towards the end of the examined period.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 8. Regression quantiles 8Regression quantilesWe use of a time-dependent threshold based on the quantile regressionmethodology.Consider the linear regression model Y = Xβ + E, (1)where Y is an (n×1) vector of observations, X is an (n×(p+1)) matrix,β is the ((p + 1) × 1) unknown parameter (p ≥ 1)and E is an (n × 1)vector of i. i. d. errors.We assume that the first column of X is 1n , i.e. the first component ofβ is an intercept.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 9. Regression quantiles 9R. Koenker a G. Basset (1978) defined the α-regression quantile β (α)(0 < α < 1) for the model (1) as any solution of the minimization n ρα (Yi − xit) := min, t ∈ I p+1, R (2) i=1where ρα (x) = xψα (x), x ∈ I 1 and ψα (x) = α − I[x<0] , x ∈ I 1 . R R (3)Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 10. Regression quantiles 10 6 4 2 y 0 -2 70% -4 30% -1.0 -0.5 0.0 0.5 1.0 x The advantage of this approach is that many aspects of usual quantiles and order statistics are generalized naturally to the linear model.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 11. Regression quantiles 11Mean annual number of exceedances above the threshold (averaged overgridpoints) for the 95% regression quantile and the 95% quantile.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 12. Regression quantiles 12Computation: It is possible to characterize the α-regression quantileβ(α) as the component β of the optimal solution (β, r+ , r− ) of the linearprogram α1n r+ + (1 − α)1n r− := min X β + r+ − r− = Y (4) β ∈ I p+1, r+ , r− ∈ I + 0 < α < 1, R Rnwhere 1n = (1, . . . , 1) ∈ I n . RR – package quantregWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 13. Regression quantiles 13Theory – POT : Let X1, X2, . . . be iid random variables with dis-tribution function F . The behavior of extreme events (all values ex-ceeding a given high threshold u) is given by the conditional probabilityP (Xi > y|Xi > u) and P (Xi < y|Xi > u) → H(y), u → uend,with ⎧ −1/γ ⎨ 1− 1+γ x−µ γ=0 σ H(y) = , ⎩ − x−µ 1−e ( σ ) γ=0where 1 + γ x−µ σ > 0 and uend is the right end-point of the variable Xi.Dienstbier and Picek (2011) showed that also the limiting distribution ofnormalized excesses of a regression quantile threshold is the GeneralizedPareto.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 14. Regression quantiles 14The formal dual program to (4) can be written in the form ˆ Yn a := max ˆ Xna = (1 − α)X 1n (5) ˆ a ∈ [0, 1]n, 0<α<1 ˆThe components of the optimal solutions a(α) = (ˆ1 (α), . . . , an (α)) are a ˆcalled the regression rank scores. (Gutenbrunner and Jureˇkov´ 1992) c aWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 15. Tests 15TestsH´jek (1965) extended the Kolmogorov - Smirnov test to verify the hy- apothesis of randomness against the regression alternative. He consideredthe rank - scores process and showed that not only the Kolmogorov -Smirnov test but many other rank test can be expressed as functionals ofrank - scores process.A general class of tests based on regression rank scores, parallel to classicalrank tests as the Wilcoxon, normal scores and median, was constructedin Gutenbrunner et al. (1993), ...Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 16. Tests 16Typically, the test based on regression rank scores applies to the model Y = X1β + X2γ + E, (6)where β and γ are p- and q-dimensional parameters, X1 of order (n × p)and X2 of order n × q, respectively, where one verifies the hypothesis H0 : γ = 0, β unspecifiedWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 17. Tests 17Results of the tests on parameters of the linear and quadratic terms ofthe 95% regression quantiles in individual GCM scenarios.Percentage of gridpoints in which the examined parameter is significantlydifferent from zero at p=0.05 GCM Scenario Linear Quadratic CM2.0 A2 100.0 90.3 A1B 98.1 43.5 B1 98.1 43.1 CM2.1 A2 98.9 77.5 A1B 99.4 38.7 B1 98.9 54.6 A1FI 99.8 47.4Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 18. Tests 18Threshold Selection”Remove trend and apply on residuals:” • Threshold Choice plot Let X ∼ GP D(µ0, σ0 , γ0 ). Let µ1 > µ0 be another threshold. The r.v. X|X > µ1 is also GPD with updated parametrs σ1 = σ0 + γ0(µ1 − µ0) and γ1 = γ0. Let σ = σ1 − γ1µ1. σ and γ1 are constant for µ1 > µ0 if µ1 > µ0 is a suitable threshold. • Mean Residual Life Plot • L-Moments plotWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 19. Tests 19 −0.6 40 −0.7 35 Modified Scale Shape −0.8 30 −0.9 25 0.90 0.92 0.94 0.96 0.98 0.90 0.92 0.94 0.96 0.98 Threshold ThresholdWorkshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
  • 20. Conclusions 20Conclusions • The proposed non-stationary peaks-over-threshold method with time- dependent thresholds estimated using regression quantiles is compu- tationally straightforward • The limiting distribution of normalized excesses of a regression quan- tile threshold is the Generalized Pareto. • The choice of regression model is based on the ”rank” tests corre- sponding to regression quantiles. • We can use usual tools to select a suitable threshold.Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012