This document discusses halfspace depth, a method for defining multivariate outliers and depth contours. It introduces the concept of halfspace depth which defines the depth of a point as the smallest probability of a closed halfspace containing that point. It then discusses how to empirically estimate depth and define depth sets containing points with depth greater than a given value. The document also notes that the empirical depth sets are sensitive to the algorithm used and presents an example comparing two different algorithms applied to the same data. Finally, it discusses potential extensions of this concept such as applying it to functional data through a functional bagplot.
Portuguese Market and On-board Sampling Effort ReviewErnesto Jardim
Accurate and precise estimation of discards is a major objective of data collection programs throughout the world. Discard reduction is also a major topic of the new Common Fisheries Policy (CFP) and the future Data Collection Multi-Annual Programme (DC-MAP). Using data from the Portuguese on-board observer programme that samples two otter trawl fisheries in ICES Division IXa, we compare two different approaches for estimating the sampling effort required to attain "assessment grade" discard estimates: a model-based approach (exponential-decay models) and a probability-based approach (based on classic sampling theory). We show that both approaches attain comparable sample size estimates and that the sample size required to attain precision objectives
varies across species and across fisheries being likely influenced by discard motifs. We demonstrate that sampling levels at least two fold higher than the present sampling levels would be required to attain the precision levels set in the current Data Collection Framework (DCF). We discuss the implications of these results in light of the future ability of European onboard sampling programs to detect, e.g., progressive reductions in discard levels.
Portuguese Market and On-board Sampling Effort ReviewErnesto Jardim
Accurate and precise estimation of discards is a major objective of data collection programs throughout the world. Discard reduction is also a major topic of the new Common Fisheries Policy (CFP) and the future Data Collection Multi-Annual Programme (DC-MAP). Using data from the Portuguese on-board observer programme that samples two otter trawl fisheries in ICES Division IXa, we compare two different approaches for estimating the sampling effort required to attain "assessment grade" discard estimates: a model-based approach (exponential-decay models) and a probability-based approach (based on classic sampling theory). We show that both approaches attain comparable sample size estimates and that the sample size required to attain precision objectives
varies across species and across fisheries being likely influenced by discard motifs. We demonstrate that sampling levels at least two fold higher than the present sampling levels would be required to attain the precision levels set in the current Data Collection Framework (DCF). We discuss the implications of these results in light of the future ability of European onboard sampling programs to detect, e.g., progressive reductions in discard levels.
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
Robust parametric classification and variable selection with minimum distance...echi99
We present a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of L1 regularized model fitting has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. This is an asymptotically efficient procedure under very general conditions - provided that the model is specified correctly. Correctly specifying a model, however, is not trivial. Even a few outliers among data drawn from an otherwise pure sample of data can result in a very poor model. In contrast, minimizing the integrated square error, while less efficient, proves to be robust to a fair amount of contamination. We propose to fit logistic models using this alternative criterion to address the possibility of model misspecification. The resulting method may be considered a robust variant of regularized maximum likelihood methods for high dimensional data.
Talk at the modcov19 CNRS workshop, en France, to present our article COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
Robust parametric classification and variable selection with minimum distance...echi99
We present a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of L1 regularized model fitting has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. This is an asymptotically efficient procedure under very general conditions - provided that the model is specified correctly. Correctly specifying a model, however, is not trivial. Even a few outliers among data drawn from an otherwise pure sample of data can result in a very poor model. In contrast, minimizing the integrated square error, while less efficient, proves to be robust to a fair amount of contamination. We propose to fit logistic models using this alternative criterion to address the possibility of model misspecification. The resulting method may be considered a robust variant of regularized maximum likelihood methods for high dimensional data.
Talk at the modcov19 CNRS workshop, en France, to present our article COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability
1. Arthur CHARPENTIER, Solvency II’ newspeak
Stress Testing & Reverse Stress Testing
Alexander J. McNeil
arthur.charpentier@univ-rennes1.fr
http ://blogperso.univ-rennes1.fr/arthur.charpentier/index.php/
Financial Risks International Forum ‘Risk Dependencies’, March 2010
1
2. Arthur CHARPENTIER, Solvency II’ newspeak
Defining halfspace depth
Given y ∈ Rd , and a direction u ∈ Rd , define the closed half space
Hy,u = {x ∈ Rd such that u x ≤ u y}
and define depth at point y by
depth(y) = inf {P(Hy,u )}
u,u=0
i.e. the smallest probability of a closed half space containing y.
The empirical version is (see Tukey, 1975)
n
1
depth(y) = min 1(X i ∈ Hy,u )
u,u=0 n i=1
For α > 0.5, define the depth set as
Dα = {y ∈ R ∈ Rd such that ≥ 1 − α}.
The empirical version is can be related to the bagplot (Rousseeuw & Ruts, 1999).
2
3. Arthur CHARPENTIER, Solvency II’ newspeak
Empirical sets extremely sentive to the algorithm
q q
q q
q q
1.0
1.0
q q
q q
0.5
0.5
q q
q q
q q
0.0
0.0
q q
q q
−0.5
−0.5
q q
q q
q q
q q
q q q q
−1.0
−1.0
q q
q q
−1.5
−1.5
q q
q q
−2 −1 0 1 −2 −1 0 1
where the blue set is the empirical estimation for Dα , α = 0.5.
3
4. Arthur CHARPENTIER, Solvency II’ newspeak
The bagplot tool
The depth function introduced here is the multivariate extension of standard
univariate depth measures, e.g.
depth(x) = min{F (x), 1 − F (x− )}
which satisfies depth(Qα ) = min{α, 1 − α}. But one can also consider
1
depth(x) = 2 · F (x) · [1 − F (x− )] or depth(x) = 1 − − F (x) .
2
Possible extensions to functional bagplot.
4
5. Arthur CHARPENTIER, Solvency II’ newspeak
The bagplot tool for mortality models
On the a French dataset, we have the following past outliers,
1914q
4
q
1915q q
1916q
−2
q
3
1944q 1918q
q
q
1917q q
Log Mortality Rate
1943q
−4
q
2
PC score 2
1940q q
q qq
q
q qq
q q
q
qq qq q
q
qq
q q q
q
q
qq
q qq
1
q
1919q
−6
q
qq
qq q
q 1942 q
q
q q
q
q
qq
0
q q
q q
q
q qq q
q q qq
q
q q
q q q
q q q
qq q
−8
q q qq
qqqq qq
q
q q q
q q
q q q
qq
q
q q
q
−1
q
qq q q
q
q
0 20 40 60 80 −10 −5 0 5 10 15
Age PC score 1
(here male log-mortality rates in France from 1899 to 2005).
5
7. Arthur CHARPENTIER, Solvency II’ newspeak
Further references
Febrero, N., Galeano, P. & Gonzalez-Manteiga, W. (2007). A functional analysis
of NOx levels : location and scale estimation and outlier detection.
Computational Statistics 22(3), 411-427.
Hyndman, R.J. & Shang, H.L. (2010). Rainbow plots, bagplots and boxplots for
functional data. Journal of Computational and Graphical Statistics. 19(1), 29-45.
Rousseeuw, P.J., Ruts, I. & Tukey, J.W. (2009). The bagplot, a bivariate boxplot.
American Statistician, 53(4), 382-387.
Sood, A., James, G. & Tellis, G. (2009). Functional Regression : A New Model for
Predicting Market Penetration of New Products. Marketing Science, 28(1), 36-51.
7