The document discusses kernel methods for nonlinear system identification. It proposes using derivatives in the reproducing kernel Hilbert space (RKHS) for regularization instead of functional regularization. This allows controlling smoothness through regularization rather than choosing kernel hyperparameters. Specifically:
1) Kernel methods provide flexible nonlinear models but require choosing hyperparameters that impact smoothness.
2) The paper proposes regularizing based on derivatives in the RKHS rather than functions, allowing smoothness to be set directly through regularization.
3) This removes kernel hyperparameters from the optimization problem and permits a closed-form solution for estimates with controlled smoothness.
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
C. Guyon, T. Bouwmans. E. Zahzah, “Foreground Detection via Robust Low Rank Matrix Factorization including Spatial Constraint with Iterative Reweighted Regression”, International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 2012.
Robust Image Denoising in RKHS via Orthogonal Matching PursuitPantelis Bouboulis
We present a robust method for the image denoising task based on kernel ridge regression and sparse modeling. Added noise is assumed to consist of two parts. One part is impulse noise assumed to be sparse (outliers), while the other part is bounded noise. The noisy image is divided into small regions of interest, whose pixels are regarded as points of a two-dimensional surface. A kernel based ridge regression method, whose parameters are selected adaptively, is employed to fit the data, whereas the outliers are detected via the use of the increasingly popular orthogonal matching pursuit (OMP) algorithm. To this end, a new variant of the OMP rationale is employed that has the additional advantage to automatically terminate, when all outliers have been selected.
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
This paper presents a novel technique for
designing an Infinite Impulse Response (IIR) Filter with
Linear Phase Response. The design of IIR filter is always a
challenging task due to the reason that a Linear Phase
Response is not realizable in this kind. The conventional
techniques involve large number of samples and higher
order filter for better approximation resulting in complex
hardware for implementing the same. In addition, an
extensive computational resource for obtaining the inverse
of huge matrices is required. However, we propose a
technique, which uses the frequency domain sampling along
with the linear programming concept to achieve a filter
design, which gives a best approximation for the linear
phase response. The proposed method can give the closest
response with less number of samples (only 10) and is
computationally simple. We have presented the filter design
along with its formulation and solving methodology.
Numerical results are used to substantiate the efficiency of
the proposed method.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
요즘 Image관련 Deep learning 관련 논문에서 많이 나오는
용어인 Invariance와 Equivariance의 차이를 알기쉽게 설명하는 자료를 만들어봤습니다. Image의 Transformation에 대해
Equivariant한 feature를 만들기 위하여 제안된 Group equivariant Convolutional. Neural Networks 와 Capsule Nets에 대하여 설명
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.
C. Guyon, T. Bouwmans. E. Zahzah, “Foreground Detection via Robust Low Rank Matrix Factorization including Spatial Constraint with Iterative Reweighted Regression”, International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 2012.
Robust Image Denoising in RKHS via Orthogonal Matching PursuitPantelis Bouboulis
We present a robust method for the image denoising task based on kernel ridge regression and sparse modeling. Added noise is assumed to consist of two parts. One part is impulse noise assumed to be sparse (outliers), while the other part is bounded noise. The noisy image is divided into small regions of interest, whose pixels are regarded as points of a two-dimensional surface. A kernel based ridge regression method, whose parameters are selected adaptively, is employed to fit the data, whereas the outliers are detected via the use of the increasingly popular orthogonal matching pursuit (OMP) algorithm. To this end, a new variant of the OMP rationale is employed that has the additional advantage to automatically terminate, when all outliers have been selected.
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
This paper presents a novel technique for
designing an Infinite Impulse Response (IIR) Filter with
Linear Phase Response. The design of IIR filter is always a
challenging task due to the reason that a Linear Phase
Response is not realizable in this kind. The conventional
techniques involve large number of samples and higher
order filter for better approximation resulting in complex
hardware for implementing the same. In addition, an
extensive computational resource for obtaining the inverse
of huge matrices is required. However, we propose a
technique, which uses the frequency domain sampling along
with the linear programming concept to achieve a filter
design, which gives a best approximation for the linear
phase response. The proposed method can give the closest
response with less number of samples (only 10) and is
computationally simple. We have presented the filter design
along with its formulation and solving methodology.
Numerical results are used to substantiate the efficiency of
the proposed method.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
요즘 Image관련 Deep learning 관련 논문에서 많이 나오는
용어인 Invariance와 Equivariance의 차이를 알기쉽게 설명하는 자료를 만들어봤습니다. Image의 Transformation에 대해
Equivariant한 feature를 만들기 위하여 제안된 Group equivariant Convolutional. Neural Networks 와 Capsule Nets에 대하여 설명
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l1-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.
Get up to 50% Off on Authentic Ancient Artifacts, Coins and Jewelry from our Latest Sale Flyer.
All orders are by phone only. Call us Toll Free at 1(800)426-2007 or 1(212)725-7537 to place your order. For faster service, please have the item and page number ready. Thank you.
The Bristol Post is a newspaper covering news in
the city of Bristol, including stories from the whole of Greater Bristol, Northern Somerset and South Gloucestershire. It was titled the Bristol Evening Post until April 2012. http://www.bristolpost.co.uk
The Bristol Post is a newspaper covering news in
the city of Bristol, including stories from the whole of Greater Bristol, Northern Somerset and South Gloucestershire. It was titled the Bristol Evening Post until April 2012. http://www.bristolpost.co.uk
Phase Retrieval: Motivation and TechniquesVaibhav Dixit
This presentation describes two techniques namely Transport of Intensity Equation(TIE) technique and Phase Diversity technique for retrieving phase information from light.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Similar to The Impact of Smoothness on Model Class Selection in Nonlinear System Identification (20)
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
The Impact of Smoothness on Model Class Selection in Nonlinear System Identification
1. The Impact of Smoothness on Model Class
Selection in Nonlinear System Identification:
An Application of Derivatives in the RKHS
Y. Bhujwalla, V. Laurain, M. Gilson
6th July 2016
yusuf-michael.bhujwalla@univ-lorraine.fr
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 1 / 23
2. Introduction
The Data-Generating System
Measured data : DN = {(u1, y1), (u2, y2), . . . , (uN , yN )}.
Describes So, an unknown nonlinear system with function fo : X → R,
So :
yo,k = fo(xk)
yk = yo,k + eo,k
Where xk = [yk−1 · · · yk−na uk · · · uk−nb ]⊤
∈ X = Rna+nb+1
.
Parametric Models
Nθ low (fixed)
→ Physically interpretable
Choice of basis function?
→ Combinatorially hard problem X
Nonparametric Models
Nθ high (∼ data)
→ Not interpretable X
Can define a general model class.
→ Flexibility
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 2 / 23
3. Introduction
The Data-Generating System
Measured data : DN = {(u1, y1), (u2, y2), . . . , (uN , yN )}.
Describes So, an unknown nonlinear system with function fo : X → R,
So :
yo,k = fo(xk)
yk = yo,k + eo,k
Where xk = [yk−1 · · · yk−na uk · · · uk−nb ]⊤
∈ X = Rna+nb+1
.
Parametric Models
Nθ low (fixed)
→ Physically interpretable
Choice of basis function?
→ Combinatorially hard problem X
Nonparametric Models
Such as kernel methods :
Input
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Output
0
0.5
1
1.5
2
yo
kx
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 2 / 23
4. Outline
1 Kernel Methods in Nonlinear Identification
2 The Kernel Selection Problem
3 Smoothness in the RKHS
4 Simulation Examples
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 3 / 23
5. 1. Kernel Methods in Nonlinear Identification
Reproducing Kernel Hilbert Spaces
Hilbert Spaces
H is a space over a class of functions, f : X → R ∈ H :
· ∥ f ∥H
· ⟨ f , g ⟩H.
In system identification, H ⇔ model class.
Reproducing Kernels
H has a unique, associated kernel function, K : X × X → R, spanning the space
H.
The Reproducing Property states that f (x) can be explicitly represented as an
infinite sum in terms of the kernel function :
f (x) = ⟨ f , Kx⟩H =
∞
i=1
αiK(xi, x)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 4 / 23
6. 1. Kernel Methods in Nonlinear Identification
Identification in the RKHS
Identification in the RKHS
For ˆf ∈ H close to fo, ˆf should reflect observations :
ˆf = min
f
{ V( f ) = L(x, y, f (x)) }
However, infinitely many solutions ⇒ add constraint to model :
ˆf = min
f
{ V( f ) = L(x, y, f (x)) + g(∥ f ∥H) }
For such cost-functions, f (x) can be reduced to :
f (x) =
N
i=1
αiK(xi, x), α ∈ RN
· f (x) → a finite sum over the observations.
· The Representer Theorem (Schölkopf, Herbrich and Smola, 2001)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 5 / 23
7. 1. Kernel Methods in Nonlinear Identification
A Widely-Used Example
A Widely-Used Example
As an example minimise squared-error :
L(x, y, f (x)) = ∥y − f (x)∥2
2,
and use regularisation to avoid overparameterisation :
g(∥ f ∥H) = λ∥ f ∥2
H.
Giving :
Vf : V( f ) = ∥y − f (x)∥2
2 + λf ∥ f ∥2
H
⇒ αf = (K + λf I)−1
y
· Solution depends on
I. K and
II. λf
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 6 / 23
8. Outline
1 Kernel Methods in Nonlinear Identification
2 The Kernel Selection Problem
3 Smoothness in the RKHS
4 Simulation Examples
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 7 / 23
9. 2. The Kernel Selection Problem
Choosing a Kernel Function
Choosing a kernel function...
K defines the model class
Let X = R, and K be the Gaussian
RBF kernel :
K(xi, x) = exp −
∥x − xi∥2
σ2
.
Width (σ) defines smoothness of
the kernel function.
Hence σ determines the model
class !
Other kernels have different
hyperparameters, but they will still
influence H.
Input
-1 -0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
Input
-1 -0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
KxKx
σ1
σ2 > σ1
σ
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 8 / 23
10. 2. The Kernel Selection Problem
Implications of the Hyperparameter Selection
Estimation of 1D switching signal
using Vf = ∥y − f (x)∥2
2 + λf ∥f ∥2
H.
Many observations (N = 103
).
uk ∼ U(−1, 1).
Significant noise disturbances
(SNR = 5dB).
Two hyperparameters :
I. σ and
II. λ
-1 -0.5 0 0.5 1
-20
-10
0
10
20
30
fo(uk)
uk
FIGURE: Estimation of 1D switching
signal for different hyperparameter
values.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 9 / 23
11. 2. The Kernel Selection Problem
Implications of the Hyperparameter Selection
-1 -0.5 0 0.5 1
-20
0
20
-1 -0.5 0 0.5 1
-20
0
20
-1 -0.5 0 0.5 1
-20
0
20
-1 -0.5 0 0.5 1
-20
0
20
yoyo
yoyo
ˆfMEAN
ˆfMEAN
ˆfMEAN
ˆfMEAN
ˆfSD
ˆfSD
ˆfSD
ˆfSD
SMALL λ LARGE λ
SMALLσLARGEσ
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 10 / 23
12. 2. The Kernel Selection Problem
Implications of the Hyperparameter Selection
-1 -0.5 0 0.5 1
-20
0
20
-1 -0.5 0 0.5 1
-20
0
20
-1 -0.5 0 0.5 1
-20
0
20
-1 -0.5 0 0.5 1
-20
0
20
SMOOTHNESS
FLEXIBILITY
yoyo
yoyo
ˆfMEAN
ˆfMEAN
ˆfMEAN
ˆfMEAN
ˆfSD
ˆfSD
ˆfSD
ˆfSD
SMALL λ LARGE λ
SMALLσLARGEσ
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 10 / 23
13. 2. The Kernel Selection Problem
Summary
Summary
Vf : V(f ) = ∥y − f (x)∥2
2 + λf ∥ f ∥2
H.
Kernel framework very effective :
· flexible,
· well-understood.
However, choice of kernel often compromised (e.g. by noise).
⇒ Trade-off between flexibility and smoothness.
So, why regularise over ∥ f ∥H . . .
. . . when smoothness is often a more interesting property to control?
⇒ Desirable property in many models.
⇒ Characterises many systems.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 11 / 23
14. Outline
1 Kernel Methods in Nonlinear Identification
2 The Kernel Selection Problem
3 Smoothness in the RKHS
4 Simulation Examples
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 12 / 23
15. 3. Smoothness in the RKHS
Regularisation Using Derivatives
Proposition
Replace functional regularisation :
Vf : V(f ) = ∥y − f (x)∥2
2 + λf ∥ f ∥2
H,
With smoothness-enforcing regularisation :
VD : V(f ) = ∥y − f (x)∥2
2 + λD∥Df ∥2
H.
Now :
· Hence, smoothness controlled by regularisation.
· And, kernel hyperparameter removed from optimisation problem.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 13 / 23
16. 3. Smoothness in the RKHS
Regularisation Using Derivatives
Proposition
Replace functional regularisation :
Vf : V(f ) = ∥y − f (x)∥2
2 + λf ∥ f ∥2
H,
With smoothness-enforcing regularisation :
VD : V(f ) = ∥y − f (x)∥2
2 + λD∥Df ∥2
H.
Now :
· Hence, smoothness controlled by regularisation.
· And, kernel hyperparameter removed from optimisation problem.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 13 / 23
17. 3. Smoothness in the RKHS
Regularisation Using Derivatives
Proposition
Replace functional regularisation :
Vf : V(f ) = ∥y − f (x)∥2
2 + λf ∥ f ∥2
H,
With smoothness-enforcing regularisation :
VD : V(f ) = ∥y − f (x)∥2
2 + λD∥Df ∥2
H.
Now :
· Hence, smoothness controlled by regularisation.
· And, kernel hyperparameter removed from optimisation problem.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 13 / 23
18. 3. Smoothness in the RKHS
Derivatives in the RKHS
Derivatives in the RKHS
For f ∈ H, Df ∈ H (Zhou, 2008)
Hence, a derivative reproducing property can be defined :
Df = ⟨ f , DKx ⟩H
The Representer Theorem
Representer f (x) = N
i=1 αiK(xi, x) requires
g(∥ f ∥H) : a monotically increasing function of ∥ f ∥H
Clearly, ∥Df ∥H g(∥ f ∥H) ⇒ representer is suboptimal for VD.
However, if system is well-excited, f (x) = N
i=1 αiK(xi, x) can be used.
However, it loosely preserves the bias-variance properties of Vf
lim
λ→∞
f (x) = 0, ∀x ∈ R.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 14 / 23
19. 3. Smoothness in the RKHS
Derivatives in the RKHS
A Closed-Form Solution
Using derivative reproducing property, ∥Df ∥H can be defined :
∥Df ∥2
H = α⊤
D(1, 1)
Kα,
where
D(1, 1)
K(xi, xj) =
∂2
K(xi, xj)
∂xj ∂xi
.
Permitting a closed-form solution :
αD = K⊤
K + λDD(1, 1)
K
−1
K⊤
y.
As per Vf ⇒ αf = (K + λf I)−1
y.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 15 / 23
20. Outline
1 Kernel Methods in Nonlinear Identification
2 The Kernel Selection Problem
3 Smoothness in the RKHS
4 Simulation Examples
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 16 / 23
21. 4. Simulation Examples
Example 1 : Effect of the Regularisation
Estimation of 1D switching signal
using Vf and VD.
Many observations (N = 103
).
uk ∼ U(−1, 1).
Significant noise disturbances
(SNR = 5dB).
Gaussian RBF kernel, with σ = 0.01.
Varying levels of regularisation
(through λf , λD).
-1 -0.5 0 0.5 1
-20
-10
0
10
20
30
fo(uk)
uk
FIGURE: Estimation of 1D switching
signal for different λ values.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 17 / 23
22. 4. Simulation Examples
Example 1 : Effect of the Regularisation
⇒ Negligible regularisation (very small λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output -20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 18 / 23
23. 4. Simulation Examples
Example 1 : Effect of the Regularisation
⇒ Light regularisation (small λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output -20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 18 / 23
24. 4. Simulation Examples
Example 1 : Effect of the Regularisation
⇒ Moderate regularisation.
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output -20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 18 / 23
25. 4. Simulation Examples
Example 1 : Effect of the Regularisation
⇒ Heavy regularisation (large λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output -20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 18 / 23
26. 4. Simulation Examples
Example 1 : Effect of the Regularisation
⇒ Excessive regularisation (very large λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output -20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 18 / 23
27. 4. Simulation Examples
Example 2 : 1D Structural Selection
Identification of two unknown systems (X ∈ [−1, 1], SNR = 10dB, N = 103
).
Vf : λ, σ optimised using cross-validation.
VD : λ optimised using cross-validation, σ set based on data.
-0.5 0 0.5
-10
-5
0
5
10
15
20
25
f1
o(uk)
uk
FIGURE: S1
o : Smooth
-0.5 0 0.5
-10
-5
0
5
10
15
20
25
f2
o(uk)
uk
FIGURE: S2
o : Nonsmooth
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 19 / 23
28. 4. Simulation Examples
Example 2 : Smooth S1
o
Using a small kernel, VD can reconstruct a smooth function.
Not feasible using Vf - needs kernel smoothing effect.
Input
-0.5 0 0.5
Output
-10
-5
0
5
10
15
20
25
ˆfMEAN
ˆfSD
kx
FIGURE: Vf : R( f)
Input
-0.5 0 0.5
Output
-10
-5
0
5
10
15
20
25
ˆfMEAN
ˆfSD
kx
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 20 / 23
29. 4. Simulation Examples
Example 2 : Nonsmooth S2
o
Using a small kernel, VD can detect structural nonlinearity.
However, Vf is too smooth, as σ must counteract noise.
Input
-0.5 0 0.5
Output
-10
-5
0
5
10
15
20
25
ˆfMEAN
ˆfSD
kx
FIGURE: Vf : R( f)
Input
-0.5 0 0.5
Output
-10
-5
0
5
10
15
20
25
ˆfMEAN
ˆfSD
kx
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 21 / 23
30. Conclusions
RKHS in Nonlinear Identification
Flexible framework : attractive for nonlinear identification.
Smoothness controlled by kernel function and regularisation (σ and λf )
⇒ Constrained kernel function.
Derivatives in the RKHS
Smoothness controlled by regularisation (λD).
⇒ Simpler steering of the smoothness.
Simpler hyperparameter optimisation (just λD) and increased model flexibility.
⇒ Through use of a smaller kernel (small σ).
However, relies on a suboptimal representer.
⇒ Nonetheless, promising results have been obtained.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 22 / 23
31. The Impact of Smoothness on Model Class
Selection in Nonlinear System Identification:
An Application of Derivatives in the RKHS
Y. Bhujwalla, V. Laurain, M. Gilson
6th July 2016
yusuf-michael.bhujwalla@univ-lorraine.fr
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 23 / 23
32. A. Bibliography
Alternative Smoothness-Enforcing Optimisation Schemes
Sobolev Spaces (Wahba, 1990 ; Pillonetto et al, 2014)
∥f ∥Hk
=
m
i=0 X
di
f (x)
dxi
2
dx
Identification using derivative observations (Zhou, 2008; Rosasco et al, 2010)
Vobvs( f ) = ∥y − f (x)∥2
2 + γ1
dy
dx
−
df (x)
dx
2
2
+ · · · γm
dm
y
dxm
−
dm
f (x)
dxm
2
2
+ λ ∥f ∥H
Regularization Using Derivatives (Rosasco et al, 2010; Lauer, Le and Bloch,
2012; Duijkers et al, 2014)
VD( f ) = ∥y − f (x)∥2
2 + λ∥Dm
f ∥p.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 23 / 23
33. A. Bibliography
Literature Review
Kernel Methods in Machine Learning and System Identification
· Kernel methods in system identification, machine learning and function
estimation : A survey, G. Pillonetto, F. Dinuzzo, T. Chen, G. D. Nicolao and L.
Ljung, 2014.
· Learning with Kernels, B. Schölkopf, R. Herbrich and A. J. Smola, 2002.
· Gaussian Processes for Machine Learning, C. Rasmussen and C. Williams,
2006.
Reproducing Kernel Hilbert Spaces
· Theory of Reproducing Kernels, N. Aronszajn, 1950.
· A Generalized Representer Theorem, B. Schölkopf, R. Herbrich and A. J. Smola,
2001.
· Derivative reproducing properties for kernel methods in learning theory, D. Zhou,
2008.
Yusuf Bhujwalla (Université de Lorraine) IEEE ACC 2016 23 / 23