This document proposes an RKHS approach to systematic kernel selection in nonlinear system identification. It discusses:
1. Using kernel methods for nonlinear system identification by representing functions in terms of kernels.
2. The kernel selection problem of choosing an overly flexible or constrained model class.
3. Incorporating derivative information into the problem formulation to transfer model selection from kernel optimization to explicit regularization over derivatives.
Abstract : For many years, Machine Learning has focused on a key issue: the design of input features to solve prediction tasks. In this presentation, we show that many learning tasks from structured output prediction to zero-shot learning can benefit from an appropriate design of output features, broadening the scope of regression. As an illustration, I will briefly review different examples and recent results obtained in my team.
Maximizing the spectral gap of networks produced by node removalNaoki Masuda
Presentation slides for the following two papers (currently available in the pdf format only).
(1) T. Watanabe, N. Masuda.
Enhancing the spectral gap of networks by node removal.
Physical Review E, 82, 046102 (2010).
(2) N. Masuda, T. Fujie, K. Murota.
Semidefinite programming for maximizing the spectral gap.
In: Complex Networks IV, Studies in Computational Intelligence, 476, 155-163 (2013).
Abstract : For many years, Machine Learning has focused on a key issue: the design of input features to solve prediction tasks. In this presentation, we show that many learning tasks from structured output prediction to zero-shot learning can benefit from an appropriate design of output features, broadening the scope of regression. As an illustration, I will briefly review different examples and recent results obtained in my team.
Maximizing the spectral gap of networks produced by node removalNaoki Masuda
Presentation slides for the following two papers (currently available in the pdf format only).
(1) T. Watanabe, N. Masuda.
Enhancing the spectral gap of networks by node removal.
Physical Review E, 82, 046102 (2010).
(2) N. Masuda, T. Fujie, K. Murota.
Semidefinite programming for maximizing the spectral gap.
In: Complex Networks IV, Studies in Computational Intelligence, 476, 155-163 (2013).
Multi objective optimization and Benchmark functions resultPiyush Agarwal
Implemented Strength Pareto Evolutionary Algorithm (SPEA 2) and Non Dominated Sorting Genetic Algorithm (NSGA II) in MATLAB, Guide Assistant Prof. Divy Kumar, MNNIT Allahabad.
The two algorithms are use to solve multiobjective functions. Tested the algorithms on all the benchmark functions.
Applied both the algorithms to solve Portfolio Optimization satisfying different types of constraints to derive the optimal portfolio.
At times it is useful to consider a function whose derivative is a given function. We look at the general idea of reversing the differentiation process and its applications to rectilinear motion.
the slides are aimed to give a brief introductory base to Neural Networks and its architectures. it covers logistic regression, shallow neural networks and deep neural networks. the slides were presented in Deep Learning IndabaX Sudan.
Multi objective optimization and Benchmark functions resultPiyush Agarwal
Implemented Strength Pareto Evolutionary Algorithm (SPEA 2) and Non Dominated Sorting Genetic Algorithm (NSGA II) in MATLAB, Guide Assistant Prof. Divy Kumar, MNNIT Allahabad.
The two algorithms are use to solve multiobjective functions. Tested the algorithms on all the benchmark functions.
Applied both the algorithms to solve Portfolio Optimization satisfying different types of constraints to derive the optimal portfolio.
At times it is useful to consider a function whose derivative is a given function. We look at the general idea of reversing the differentiation process and its applications to rectilinear motion.
the slides are aimed to give a brief introductory base to Neural Networks and its architectures. it covers logistic regression, shallow neural networks and deep neural networks. the slides were presented in Deep Learning IndabaX Sudan.
The Bristol Post is a newspaper covering news in
the city of Bristol, including stories from the whole of Greater Bristol, Northern Somerset and South Gloucestershire. It was titled the Bristol Evening Post until April 2012. http://www.bristolpost.co.uk
The Bristol Post is a newspaper covering news in
the city of Bristol, including stories from the whole of Greater Bristol, Northern Somerset and South Gloucestershire. It was titled the Bristol Evening Post until April 2012. http://www.bristolpost.co.uk
The Bristol Post is a newspaper covering news in
the city of Bristol, including stories from the whole of Greater Bristol, Northern Somerset and South Gloucestershire. It was titled the Bristol Evening Post until April 2012. http://www.bristolpost.co.uk
Allan Baitcher - Bid on It Magazine ArticleAllan Baitcher
Allan Baitcher solicits bids for a suit of armor during an auction at King Galleries in Roswell. Baitcher is an auctioneer and appraiser from Atlanta, Ga. Magazine article by Adina Solomon.
Lesson 19: The Mean Value Theorem (Section 041 handout)Matthew Leingang
The Mean Value Theorem is the most important theorem in calculus. It is the first theorem which allows us to infer information about a function from information about its derivative. From the MVT we can derive tests for the monotonicity (increase or decrease) and concavity of a function.
Lesson 19: The Mean Value Theorem (Section 021 handout)Matthew Leingang
The Mean Value Theorem is the most important theorem in calculus. It is the first theorem which allows us to infer information about a function from information about its derivative. From the MVT we can derive tests for the monotonicity (increase or decrease) and concavity of a function.
In this experiment, I tried to implement Minimum
error rate classifier using the posterior probabilities which
uses Normal distribution to calculate likelihood probabilities to
classify given sample points
In this tutorial I will provide a survey of recent research efforts on the application of QMC methods to PDEs with random coefficients. Such PDE problems occur in the area of uncertainty quantification. A prime example is the flow of water through a disordered porous medium. There is a huge body of literature on this topic using a variety of methods. QMC methods are relatively new to this application area. The aim of this tutorial is to provide an entry point for QMC experts wanting to start research in this direction, for PDE analysts and practitioners wanting to tap into contemporary QMC theory and methods, and for anyone else who sees how to cross-fertilize the ideas to other application areas.
Information-theoretic clustering with applicationsFrank Nielsen
Information-theoretic clustering with applications
Abstract: Clustering is a fundamental and key primitive to discover structural groups of homogeneous data in data sets, called clusters. The most famous clustering technique is the celebrated k-means clustering that seeks to minimize the sum of intra-cluster variances. k-Means is NP-hard as soon as the dimension and the number of clusters are both greater than 1. In the first part of the talk, we first present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means but also other kinds of clustering algorithms like the k-medoids, the k-medians, the k-centers, etc.
We extend the method to incorporate cluster size constraints and show how to choose the appropriate number of clusters using model selection. We then illustrate and refine the method on two case studies: 1D Bregman clustering and univariate statistical mixture learning maximizing the complete likelihood. In the second part of the talk, we introduce a generalization of k-means to cluster sets of histograms that has become an important ingredient of modern information processing due to the success of the bag-of-word modelling paradigm.
Clustering histograms can be performed using the celebrated k-means centroid-based algorithm. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We prove that the Jeffreys centroid can be expressed analytically using the Lambert W function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms and conclude with some remarks on the k-means histogram clustering.
References: - Optimal interval clustering: Application to Bregman clustering and statistical mixture learning IEEE ISIT 2014 (recent result poster) http://arxiv.org/abs/1403.2485
- Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms.
IEEE Signal Process. Lett. 20(7): 657-660 (2013) http://arxiv.org/abs/1303.7286
http://www.i.kyoto-u.ac.jp/informatics-seminar/
Dynamic stiffness and eigenvalues of nonlocal nano beams - new methods for dynamic analysis of nano-scale structures. This lecture gives a review and proposed new techniques.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Similar to An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identification (20)
An Approach to Detecting Writing Styles Based on Clustering Techniquesambekarshweta25
An Approach to Detecting Writing Styles Based on Clustering Techniques
Authors:
-Devkinandan Jagtap
-Shweta Ambekar
-Harshit Singh
-Nakul Sharma (Assistant Professor)
Institution:
VIIT Pune, India
Abstract:
This paper proposes a system to differentiate between human-generated and AI-generated texts using stylometric analysis. The system analyzes text files and classifies writing styles by employing various clustering algorithms, such as k-means, k-means++, hierarchical, and DBSCAN. The effectiveness of these algorithms is measured using silhouette scores. The system successfully identifies distinct writing styles within documents, demonstrating its potential for plagiarism detection.
Introduction:
Stylometry, the study of linguistic and structural features in texts, is used for tasks like plagiarism detection, genre separation, and author verification. This paper leverages stylometric analysis to identify different writing styles and improve plagiarism detection methods.
Methodology:
The system includes data collection, preprocessing, feature extraction, dimensional reduction, machine learning models for clustering, and performance comparison using silhouette scores. Feature extraction focuses on lexical features, vocabulary richness, and readability scores. The study uses a small dataset of texts from various authors and employs algorithms like k-means, k-means++, hierarchical clustering, and DBSCAN for clustering.
Results:
Experiments show that the system effectively identifies writing styles, with silhouette scores indicating reasonable to strong clustering when k=2. As the number of clusters increases, the silhouette scores decrease, indicating a drop in accuracy. K-means and k-means++ perform similarly, while hierarchical clustering is less optimized.
Conclusion and Future Work:
The system works well for distinguishing writing styles with two clusters but becomes less accurate as the number of clusters increases. Future research could focus on adding more parameters and optimizing the methodology to improve accuracy with higher cluster values. This system can enhance existing plagiarism detection tools, especially in academic settings.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identification
1. An RKHS Approach to
Systematic Kernel Selection
in Nonlinear System Identification
Y. Bhujwalla, V. Laurain, M. Gilson
55th IEEE Conference on Decision and Control
yusuf-michael.bhujwalla@univ-lorraine.fr
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 1 / 14
2. Introduction
Problem Description
Measured data :
DN = {(u1, y1), (u2, y2), . . . , (uN, yN )}
Describing an unknown system :
So :
yo,k = fo(xk), fo : X → R
yk = yo,k + eo,k, eo,k ∼ N(0, σ2
e )
- xk = [ yk−1 · · · yk−na u1,k · · · u1,k−nb u2,k · · · unu,k−nb ]⊤
∈ X = Rna+nu (nb+1)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 2 / 14
3. Introduction
Modelling Objective
Aim : to choose the simplest model from a candidate set of models that accurately
describes the system :
Mopt : Accuracy (Data) vs Simplicity (Model)
⏐
⏐
⏐
⏐
Vf : V( f ) = N
k=1 ( yk − f (xk))2
+ g( f )
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 3 / 14
4. Introduction
Modelling Objective
Aim : to choose the simplest model from a candidate set of models that accurately
describes the system :
Mopt : Accuracy (Data) vs Simplicity (Model)
⏐
⏐
⏐
⏐
Vf : V( f ) = N
k=1 ( yk − f (xk))2
+ g( f )
Q1 : How to choose the simplest accurate model?
- Often g( f ) = λ ∥ f ∥2
H - ensure uniqueness of the solution
- λ → controls the bias-variance trade-off
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 3 / 14
5. Introduction
Modelling Objective
Aim : to choose the simplest model from a candidate set of models that accurately
describes the system :
Mopt : Accuracy (Data) vs Simplicity (Model)
⏐
⏐
⏐
⏐
Vf : V( f ) = N
k=1 ( yk − f (xk))2
+ g( f )
Q1 : How to choose the simplest accurate model?
- Often g( f ) = λ ∥ f ∥2
H - ensure uniqueness of the solution
- λ → controls the bias-variance trade-off
Q2 : How to determine a suitable set of candidate models. . . ?
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 3 / 14
6. Outline
1. Kernel Methods in Nonlinear Identification
2. Model Selection Using Derivatives
3. Smoothness-Enforcing Regularisation
4. Application : Estimation of Locally Nonsmooth Functions
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 3 / 14
7. 1. Kernel Methods in Nonlinear Identification
Input
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Output
0
0.5
1
1.5
2
ˆf
kx
→ Model :
Ff : f (x) =
N
i=1
αi kxi (x)
→ Nonparametric (nθ ∼ N)
→ Flexible : M defined through choice of K
→ Height : α (model parameters)
→ Width : σ (kernel hyperparameter)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 4 / 14
8. 1. Kernel Methods in Nonlinear Identification
Identification in the RKHS
Reproducing Kernel Hilbert Spaces
Kernel function defines the model class :
K ↔ H
Hence, functions can be represented in terms of kernels :
f (x) = ⟨ f , kx⟩H (1)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 5 / 14
9. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Choosing an overly flexible model class (a small kernel) :
FIGURE: Flexible Model Class
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
10. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Choosing an overly flexible model class (a small kernel) :
FIGURE: Flexible Model Class
-1 -0.5 0 0.5 1
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
fo
y
ˆf
kx
FIGURE: High Variance
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
11. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Choosing an overly constrained model class (a large kernel) :
FIGURE: Constrained Model Class
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
12. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Choosing an overly constrained model class (a large kernel) :
FIGURE: Constrained Model Class
-1 -0.5 0 0.5 1
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
fo
y
ˆf
kx
FIGURE: Model Biased
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
13. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Why not just choose the ’optimal’ model class ?
FIGURE: Optimal Model Class
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
14. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Why not just choose the ’optimal’ model class ?
FIGURE: Optimal Model Class
-1 -0.5 0 0.5 1
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
fo
y
ˆf
kx
FIGURE: Optimal Model
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
15. 1. Kernel Methods in Nonlinear Identification
The Kernel Selection Problem
Why not just choose the ’optimal’ model class ?
• This is, in general, what we try to do.
• However, Hopt is unknown.
• Optimisation over one hyperparameter - not that difficult.
• Optimisation over multiple model structures, kernel functions and
hyperparameters → more difficult.
-1 -0.5 0 0.5 1
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
fo
y
ˆf
kx
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
16. Outline
1. Kernel Methods in Nonlinear Identification
2. Model Selection Using Derivatives
3. Smoothness-Enforcing Regularisation
4. Application : Estimation of Locally Nonsmooth Functions
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 6 / 14
17. 2. Model Selection Using Derivatives
But, note that many properties of K are encoded into its derivatives, e.g.
Smoothness f (x) = ax2
+ bx + c =⇒ d3
f(x)
dx3
∀x
= 0
f (x) = g1(x) [ x < x∗
] + g2(x) [ x > x∗
] =⇒ ∃ d f(x)
dx
∀x̸=x∗
Linearity f ( x1, x2 ) = x1 h1(x2) + h2(x2) =⇒ ∂2
f( x1,x2 )
∂x2
1 ∀x1
= 0
Separability f ( x1, x2 ) = g(x1) + h(x2) =⇒ ∂2
f( x1,x2 )
∂x1 ∂x1 ∀x1,x2
= 0
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 7 / 14
18. 2. Model Selection Using Derivatives
But, note that many properties of K are encoded into its derivatives, e.g.
Smoothness f (x) = ax2
+ bx + c =⇒ d3
f(x)
dx3
∀x
= 0
f (x) = g1(x) [ x < x∗
] + g2(x) [ x > x∗
] =⇒ ∃ d f(x)
dx
∀x̸=x∗
Linearity f ( x1, x2 ) = x1 h1(x2) + h2(x2) =⇒ ∂2
f( x1,x2 )
∂x2
1 ∀x1
= 0
Separability f ( x1, x2 ) = g(x1) + h(x2) =⇒ ∂2
f( x1,x2 )
∂x1 ∂x1 ∀x1,x2
= 0
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 7 / 14
19. 2. Model Selection Using Derivatives
But, note that many properties of K are encoded into its derivatives, e.g.
Smoothness f (x) = ax2
+ bx + c =⇒ d3
f(x)
dx3
∀x
= 0
f (x) = g1(x) [ x < x∗
] + g2(x) [ x > x∗
] =⇒ ∃ d f(x)
dx
∀x̸=x∗
Linearity f ( x1, x2 ) = x1 h1(x2) + h2(x2) =⇒ ∂2
f( x1,x2 )
∂x2
1 ∀x1
= 0
Separability f ( x1, x2 ) = g(x1) + h(x2) =⇒ ∂2
f( x1,x2 )
∂x1 ∂x1 ∀x1,x2
= 0
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 7 / 14
20. 2. Model Selection Using Derivatives
Incorporating this information into the problem formulation allows the model selection
can be transferred from an optimisation over K. . .
. . . to an explicit regularisation problem over derivatives, using an a priori flexible
model class definition.
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 8 / 14
21. Outline
1. Kernel Methods in Nonlinear Identification
2. Model Selection Using Derivatives
3. Smoothness-Enforcing Regularisation
4. Application : Estimation of Locally Nonsmooth Functions
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 8 / 14
22. 3. Smoothness-Enforcing Regularisation
Problem Formulation
Here we consider X = R → where the kernel optimisation is reduced to a
smoothness selection problem.
What would we like to do ?
Replace exisiting functional norm regularisation. . .
Vf : V( f ) =
N
k=1
( yk − f (xk))2
+ λ ∥ f ∥2
H
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 9 / 14
23. 3. Smoothness-Enforcing Regularisation
Problem Formulation
Here we consider X = R → where the kernel optimisation is reduced to a
smoothness selection problem.
What would we like to do ?
Replace exisiting functional norm regularisation. . .
Vf : V( f ) =
N
k=1
( yk − f (xk))2
+ λ ∥ f ∥2
H
With a smoothness-penalty in the cost-function. . .
VD : V( f ) =
N
k=1
( yk − f (xk))2
+ λ ∥Df ∥2
H
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 9 / 14
24. 3. Smoothness-Enforcing Regularisation
Problem Formulation
Here we consider X = R → where the kernel optimisation is reduced to a
smoothness selection problem.
What would we like to do ?
Replace exisiting functional norm regularisation. . .
Vf : V( f ) =
N
k=1
( yk − f (xk))2
+ λ ∥ f ∥2
H
With a smoothness-penalty in the cost-function. . .
VD : V( f ) =
N
k=1
( yk − f (xk))2
+ λ ∥Df ∥2
H
How ?
- ∥Df ∥2
H → known (D. X. Zhou, 2008)
- f (x) for VD → unknown
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 9 / 14
25. 3. Smoothness-Enforcing Regularisation
An Extended Representer of f (x)
A finite representer for VD does not exist.
But, by adding kernels along X , an approximate formulation can be defined :
Input (x/σ)
-6 -4 -2 0 2 4 6
Output
0
0.5
1
1.5
Observations
Obs Kernels
∥ f∥2
FIGURE: N = 2
Input (x/σ)
-6 -4 -2 0 2 4 6
Output
0
0.5
1
1.5
Observations
Obs Kernels
Add Kernels
∥Df∥2
FIGURE: (N, P) = (2, 8)
FD : f (x) =
N
i=1
αi kxi (x) +
P
j=1
α∗
j kx∗
j
(x)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 10 / 14
26. 3. Smoothness-Enforcing Regularisation
Choosing the Kernel Width
Examination of the kernel density allows us to make an a priori choice of kernel width :
Input (x/σ)
-6 -4 -2 0 2 4 6
Output
0
0.5
1
1.5
ˆf
kx
FIGURE: ρk = 0.4
Input (x/σ)
-6 -4 -2 0 2 4 6
Output
0
0.5
1
1.5
ˆf
kx
FIGURE: ρk = 0.5
Input (x/σ)
-6 -4 -2 0 2 4 6
Output
0
0.5
1
1.5
ˆf
kx
FIGURE: ρk = 0.6
Hence, for a given P we can define the maximally flexible model class for a given
problem.
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 11 / 14
27. Outline
1. Kernel Methods in Nonlinear Identification
2. Model Selection Using Derivatives
3. Smoothness-Enforcing Regularisation
4. Application : Estimation of Locally Nonsmooth Functions
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 11 / 14
28. 4. Application
Estimation of Locally Nonsmooth Functions
In VD, smoothness ∼ regularisation.
Hence, by introducing weights into the loss-function, importance of the regularisation
can be varied across X :
Vw : V( f ) =
N
i=1
(wk yk − wk f (xk))2
+ λ∥Df ∥2
H,
How to determine the weights ?
Relative to a particular modelling objective, e.g.
• wk ∼ ∥D ˆf(0)(xk)∥2
2 for piecewise constant structures, or
• wk ∼ ∥D2 ˆf(0)(xk)∥2
2 for piecewise linear structures.
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 12 / 14
29. 4. Application
Estimation of Locally Nonsmooth Functions
-0.5 0 0.5
-10
-5
0
5
10
15
20
25
yo
FIGURE: Noise-Free System
-0.5 0 0.5
-10
-5
0
5
10
15
20
25
y
FIGURE: Noisy System
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 13 / 14
32. Conclusions
Objectives :
• To simplify model selection in nonlinear identification.
• By shifting the problem to a regularisation over functional derivatives.
→ Allowing the definition of an a priori flexible model class.
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
33. Conclusions
Objectives :
• To simplify model selection in nonlinear identification.
• By shifting the problem to a regularisation over functional derivatives.
→ Allowing the definition of an a priori flexible model class.
This presentation :
• First step ⇒ consider a simple example.
→ Model selection ⇔ smoothness detection.
→ Kernel selection ⇔ hyperparameter optimisation.
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
34. Conclusions
Objectives :
• To simplify model selection in nonlinear identification.
• By shifting the problem to a regularisation over functional derivatives.
→ Allowing the definition of an a priori flexible model class.
This presentation :
• First step ⇒ consider a simple example.
→ Model selection ⇔ smoothness detection.
→ Kernel selection ⇔ hyperparameter optimisation.
Current/Future Research :
• Application to dynamical, control-oriented problems (e.g. linear
parameter-varying identification)
• Investigation of more complex model selection problems (e.g. detection of
linearities, separability. . . ).
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
35. A. Bibliography
• Sobolev Spaces (Wahba, 1990; Pillonetto et al, 2014)
∥ f ∥Hk
=
m
i=0 X
di
f (x)
dxi
2
dx
• Identification using derivative observations (Zhou, 2008 ; Rosasco et al,
2010)
Vobvs( f ) = ∥y − f (x)∥2
2 + γ1
dy
dx
−
df (x)
dx
2
2
+ · · · γm
dm
y
dxm
−
dm
f (x)
dxm
2
2
+ λ ∥f ∥H
• Regularization Using Derivatives (Rosasco et al, 2010; Lauer, Le and Bloch,
2012; Duijkers et al, 2014)
VD( f ) = ∥y − f (x)∥2
2 + λ∥Dm
f ∥p.
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
36. B. Choosing the Kernel Width
The Smoothness-Tolerance Parameter
ρk =
σ
∆x∗
, ∆x∗ =
x∗
max − x∗
min
P
, ϵˆf = 100 × 1 −
∥ˆf ∥inf
C
%.
Kernel Density (ρ)
10-2
10-1
100
SmoothnessTolerance(ϵ%)
10
-15
10
-10
10
-5
10
0
ϵ(ρ)
ˆϵ
FIGURE: Selecting an appropriate kernel using ϵ
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
37. C. Effect of the Regularisation
⇒ Negligible regularisation (very small λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
38. C. Effect of the Regularisation
⇒ Light regularisation (small λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
39. C. Effect of the Regularisation
⇒ Moderate regularisation.
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
40. C. Effect of the Regularisation
⇒ Heavy regularisation (large λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
41. C. Effect of the Regularisation
⇒ Excessive regularisation (very large λf , λD).
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: Vf : R( f)
Input
-1 -0.5 0 0.5 1
Output
-20
-10
0
10
20
30
yo
ˆfMEAN
ˆfSD
FIGURE: VD : R(Df)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
42. D. Further Examples : Detecting Piecewise Structures
So : Noise-free and observed data
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
-0.49039
0
1
1.4358
FIGURE: y(x1, x2)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
43. D. Further Examples : Detecting Piecewise Structures
Results M1 : (Vf , Ff )
FIGURE: MEDIAN FIGURE: BIAS FIGURE: SDEV
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
44. D. Further Examples : Detecting Piecewise Structures
Results M2 : (VD, FD)
FIGURE: MEDIAN FIGURE: BIAS FIGURE: SDEV
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
45. D. Further Examples : Detecting Piecewise Structures
Results M3 : (Vw, FD)
FIGURE: MEDIAN FIGURE: BIAS FIGURE: SDEV
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
46. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
47. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
48. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
49. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
50. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
51. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
52. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
53. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
54. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
55. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14
56. E. Further Examples : Enforcing Separability
f ( x1, x2 )
λ
−→ f1(x1) + f2(x2)
FIGURE: VDX : R(∂x1 ∂x2 f)
Yusuf Bhujwalla (Université de Lorraine) CDC 2016 14 / 14