1.
Piecewise Gaussian Process Modelling for Change-Point Detection Application to Atmospheric Dispersion Problems Adrien Ickowicz CMIS CSIRO February 2013
2.
Background Scientic collaboration with the University College London, the UNSW and Universite Lille 1. Atmospheric specialists; Informatics engineer; Statisticians. Input Concentration value of CBRN material at sensors location; Wind eld. Output Source location, time of release, strength for Fire-ghters; Quarantine Map for Politicians and MoD.
3.
Statistical Modelling Observation modelling: obs (i ) Yt j = (i ) Dtj i (θ) + ζtj Cθ (x , t )h(x , t |xi , tj )dxdt i ζtj ∼ N (0, σ 2 ) Ω×T where Cθ is the solution of the pde: ∂C +u C − (K C) = Q (θ) ∂t s.t. nC = 0 at ∂Ω Parameter of interest: θ ∈ (Ω × T )
4.
Existing TechniquesSource term estimation The Optimization techniques. Gradient-based methods (Elbern et al [2000], Li and Niu [2005], Lushi and Stockie [2010]) Patern search methods (Zheng et al [2008]) Genetic Algorithms (Haupt [2005], Allen et al [2009]) The Bayesian techniques. Forward modelling and MCMC (Patwardhan and Small [1992]) Backward (Adjoint) modelling and MCMC (Issartel et al [2002], Hourdin et al [2006], Yee [2010])
5.
Contribution : Gaussian Process modellingOverview We consider several observations of a stochastic process in space and time. Idea: Bayesian non-parametric estimation. Tool: Gaussian Process (Rasmussen [2006]) Joint distribution: y ∼ GP(m(x), κ(x, x )) m ∈ L2 (Ω × T , R) is the prior mean function, and κ ∈ L2 (Ω2 × T 2 , R) is the prior covariance function1 Posterior distribution: L y∗ |x∗ , x, y = N κ(x∗ , x)κ(x, x)−1 y, κ(x∗ , x∗ ) − κ(x∗ , x)κ(x, x)−1 κ(x, x∗ ) 1 the matrix K associated should be positive semidenite
6.
Contribution : Gaussian Process modellingOn the Kernel Specication A complex non parametric modelling needs to be very careful on kernel shape and kernel hyper-parameters. Basic Kernel: Isotropic, κ(x, x ) = α1 exp − 1 2α2 (x − x )2 Hyper-parameters: α1 , α23 3 32 2 21 1 10 0 0−1 −1 −1−2 −2 −2 Figure: Prediction of 3 Gaussian Process Models (and their according 0.95 CI) given 7 noisy observations. On the left, α2 = 0.1. In the middle, α2 = 2. On the right, α2 = 1000.
7.
Contribution : Gaussian Process modellingLikelihood and Multiple Kernels The hyper-parameters estimation is provided through the marginal likelihood, log p (y|x) = − 1 yT (K + σ 2 In )−1 y − 1 log |K + σ 2 In | − n log 2π 2 2 2 What if the best-tted kernel was, κ(x, x ) = i κi (x, x )1{x,x }∈i Figure: Synthetic two-phase signal.
8.
Contribution : Gaussian Process modellingChange-Point Estimation A. Parametric Estimation We assume that there exist βi such that, (x , x ) ∈ Ωi ⇔ f (x , x , βi ) ≥ 0 and f is known. Then, θ = {(αi , βi )i }, and we have, θ = argmax ˆ log p (y|x) θ Limitations: Knowledge of f Dimension of the parameter space Convexity of the marginal likelihood function
9.
Contribution : Gaussian Process modellingChange-Point Estimation B. Adaptive Estimation (1) Let XkNN ∩Br (i ) the sequence of observations associated with xi , XkNN ∩Br (i ) = xj |{xj ∈ Bir } ∩ {dji ≤ d(ik ) } k is the number of neighbours to be considered, r is the limiting radius. Justication: Avoid the lack of observations Equivalent number of observations for each estimator Avoid the hyper-parametrization of the likelihood
10.
Contribution : Gaussian Process modellingChange-Point Estimation B. Adaptive Estimation (2) Let xI = XkNN ∩Br (i ) and yI be the corresponding observations. αi = argmax ˆ log p (yI |xI ) α Idea 1: Idea 2: Cluster on αi ˆ Build the Gram matrices Ki = κ(xI , αi ) ˆ xi xi Let Λxi = {λ1 . . . λn } be the eigenvalues of but what if dim(ˆ i ) ≥ 2 ? α Ki Cluster on µi = max{Λxi }
11.
Contribution : Gaussian Process modelling Simulation ResultsFigure: Gaussian Process prediction with 1 classical isotropic kernel (green), 2 isotropic kernels with eigenvalue-basedchange point estimation (yellow), hyper-parameter-based change point estimation (purple) and parametric estimation (blue). 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50Figure: Mean of the Gaussian Process for the two-dimensional scenario. On the left, the mean is calculated with only onekernel. On the right, the mean is calculated with two kernels.
12.
Contribution : Gaussian Process modellingSimulation Results 10 Evolution of the Root MSE of the Change-point Estimation when the 8 number of observations increase RMSE 6 from 20 to 100, in the 1D case. 4 MMLE 2 JD 0 10 20 30 40 50 MEV NsMethods: 2D 2D-donut 3D Parametric JD 0.834 (0.0034) 0.763 (0.0015) 0.666 (0.0016) -MMLE, approach MEV 0.825 (0.0053) 0.817 (0.0021) 0.643 (0.0014) -MEV, EigenValue MMLE 0.858 (0.0025) 0.806 (0.0008) 0.666 (0.0002)approach -JD, Est. approach Table: The number of obs. is equal to 10d , where d is the dimension of the problem. 1000 simulations are provided. The variance is specied under brackets.
13.
Contribution : Gaussian Process modellingApplication to the Concentration Measurements We may consider the concentration measurements as observations of a stochastic process in space and time. Idea: Apply the dened approach to estimate t0 . Prior distribution: C ∼ GP(m, κ) m ∈ L2 (Ω × T , R) is the prior mean function, and κ ∈ L2 (Ω2 × T 2 , R) is the prior covariance function2 Posterior distribution: C|Y ,m=0 ∼ GP(κx ∗ x κ−1 Y , κx ∗ x ∗ − κx ∗ x κ−1 κxx ∗ ) xx xx 2 the matrix K associated should be positive semidenite
14.
Contribution : Gaussian Process modellingKernel Specication Isotropic Kernel Drif-dependant Kernel x ˙ = u (x , t ) 1 x−x 2 x (t 0 ) = x0κiso x, x = exp − α β2 sx0 ,t0 (t ) is the solution of this system.where α and β are hyper-parameters. 1 ds (x, x ) κdyn x, x = exp − σ(t , t ) 2σ(t , t )2 where we have: ds (x, x ) = (x − sx ,t (t ))2 + (x − sx ,t (t ))2 σ(t , t ) = α × (|t0 − min(t , t )| + 1)β Consider the inuence of the wind eld Consider the time-decreasing correlation Consider the evolution of the process
15.
Contribution : Gaussian Process modellingTwo Stage estimation process: Instant of Release The proposed kernel is then complex: κf = κiso 1{t ,t t } + κdyn 1{t ,t ≥t } The likelihood is not convex. 0 0 t0 has to be estimated separately. Maximum Likelihood Estimation of Hyperparameters Method: Exhaustive research of t0 . Calculation of the trace of the Gram matrix. ˆ tr = argmax tr (K (t )) t0 t ∈T
16.
Contribution : Gaussian Process modellingTwo Stage estimation process: Source location Given the time of release, we can Estimation of the source location. Comparison between the calculate the location estimation. estimators (5, 20 and 50 sensors). Target is x0 = 115, y0 = 10. x0 ˆ y0 ˆ σ(x0 ) ˆ σ(y0 ) ˆ x0 ˆ = argmax E[C|Y ,m=0 (x , tˆ )] 0 κiso 5 68.97 62.58 42.82 38.96 x ∈Ω 20 97.13 26.37 27.64 26.08 = argmax κx ∗ x κ−1 Y ˜ ˜ xx 50 104.47 21.60 28.94 19.47 x ∈Ω κf 5 108.94 12.21 42.00 17.05 where κ = κ(., tˆ ) ˜ 0 20 120.28 8.28 12.50 4.64 50 114.51 9.48 6.37 3.07
17.
Contribution : Gaussian Process modellingZero-Inated Poisson and Dirichlet Process3 We can also consider the concentration as a count of particles. Y ∼ ZIP (p , λ) p ∼ DP (H , α) log λ ∼ GP (m, κ) which then dene the mixture distribution, −λxt e k Pr (Y = k |p , λ) = pxt 1{Y =0} + (1 − pxt ) λxt 1{Y =k } k! k Major Issue: the tractability of the likelihood calculation relies on the distribution ofboth p and λ. 3 Joint work with Dr. G .Peters and Dr. I. Nevat
18.
Contribution : Bibliography A. Ickowicz, F. Septier, P. Armand, Adaptive Algorithms for the Estimation of Source Term in a Complex Atmospheric Release. Submitted to Atmospheric Environment Journal A. Ickowicz, F. Septier, P. Armand, Estimating a CBRN atmospheric release in a complex environment using Gaussian Processes. 15th international conference on information fusion, Singapore, Singapore, July 2012 F. Septier, A. Ickowicz, P. Armand, Methodes de Monte-Carlo adaptatives pour la caractérisation de termes de sources. Technical report, CEA, EOTP A-54300-05-07-AW-26, Mar. 2012 A. Ickowicz, F. Septier, P. Armand, Statistic Estimation for Particle Clouds with Lagrangian Stochastic Algorithms. Technical report, CEA, EOTP A-24300-01-01-AW-20, Nov. 2011
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment