Non-linear classification models rely commonly, on kernel functions. Models are highly dependent on a training (labeled) data sets. Models and therefore their underlying kernel have to adapt to the most recent labeled observations.
This presentation describes a solution to automated the evaluation and selection of a kernel function appropriate to a specific training set in online training.
2. Introduction
The creation and update of models require constant
monitoring the quality of the prediction or classification. It
entails the re-evaluation of model parameters, and kernel
functions if necessary with a new set of labeled data.
This presentation describes a solution to automate the
evaluation and selection of a kernel function appropriate to
a specific training set in online training.
3. Kernel functions
Kernel functions are widely used in machine learning to
deal with non-linear models for which the dimension of
the problem is not readily known
• Kernel principal components analysis
• Kernelized multi-layer perceptron
• Support vector machines
• Kernelized clustering
• Bayesian kernel density estimation
• Kernelized ridge regression
4. Some background ….
Given a space of observations of n dimension, ℛ 𝑛 the
features space is an embedded manifold ℳ 𝑝
of dimension
p << n
A kernel function is defined as the mapping
𝜙: ℛ 𝑛 → ℳ 𝑝 𝑤′ = 𝜙 𝑤
The Euclidean metric is defined as
𝑤 = 𝑤. 𝑤
5. … on kernel models.
v
Projection:
Exponential family
w
φ(w)
K(v,w) =φ(v).φ(w)
A kernel function, K is represented as a projection of features
vectors v, w onto a manifold for which the similarity is
computed from the Riemann metric.
Similarity v.w
6. Kernel function
Define a kernel function as the composition of 2 functions gλ o h
• Similarity function h
• A function gλ from the exponential family
𝒦𝜆 𝐱, 𝐲 = 𝑔 𝜆(
𝑖
ℎ(𝑥𝑖, 𝑦𝑖))
𝐾 𝜎 𝑥, 𝑦 = 𝑒
−
𝑥−𝑦
𝜎
2
𝑔 𝜎 𝑥 = 𝑒
−
𝑥
𝜎
2
ℎ 𝑥, 𝑦 = 𝑥 − 𝑦
𝐾 𝑑 x, y = 1 + 𝑥. 𝑦 𝑑 𝑔 𝑑 𝑥 = 1 + 𝑥 𝑑 ℎ 𝑥, 𝑦 = 𝑥. 𝑦
Radius basis function (RBF)
Polynomial
Sigmoid
𝐾1 x, y = tanh(1 + 2 x. y) 𝑔1 𝑥 = tanh(1 + 2𝑥) ℎ 𝑥, 𝑦 = 𝑥. 𝑦
7. What challenge?
• Creating a model require the evaluation of multiple
kernel functions. This task is rarely automated and
consumes a significant amount of human and
computing resources
• Time series or sequential data may require “loose”
kernel model definition which evolve overtime (online
training)
We need a mechanism to automate the generation,
evaluation and optimization of kernel functions.
8. Challenge online training
Online training requires a constant re-evaluation of the
model and refinement of the underlying kernel function.
F1 score
Size training or observations set
Re-evaluation Re-evaluation
Once the precision, recall or F1 score reaches a
threshold, the model has to be retrained with the same
or different kernel functions.
Valid
range
9. An approach to automation
1. Automated generation of kernel functions:
A kernel function is composed from 3 different functions
that can be assembled through monadic composition
2. Automated optimization of kernel functions:
A genetic algorithm computes the loss function
associated with each kernel candidates (fitness) and
select the most appropriate one for any specific dataset.
11. Kernel function
Define a kernel function as the composition of 2 functions g o h
• Similarity function h
• A function g from the exponential family
𝒦 𝐱, 𝐲 = 𝑔(
𝑖
ℎ(𝑥𝑖, 𝑦𝑖))
… with its implementation in Scala
12. The similarity function, h is derived from the metric defined on the
tangent space of a manifold. Here are few examples of similarity
functions.
Similarity functions
13. The projection of the metric onto the manifold defines an
exponential map or family of functions. Here are some
examples from the exponential family {g}: Radius basis
functions kernel, polynomial kernel and sigmoid kernel
Exponential map functions
14. Parameterization
𝐾𝜆 𝐱, 𝐲 = 𝑔 𝜆
i
h(xi, yi)
Kernel function are actually parameterized with a single
value or argument λ. For example the parameter for the
radius basis function is λ =1/σ2 where σ is the standard
deviation and the parameter for the polynomial kernel is λ =
d where d is the degree of the polynomial function.
A parameterized kernel function is defined as
15. The composition of the similarity (or metric) function h and the
exponential map g can be accomplished through a monadic
composition:
Kernel composition
Monads are high level abstractions derived from the theory of
category. A monad defines a computation as a sequence of
transformations of two types
• map: transforms a ‘container’ type by modifying each of its
element with a given function
• flatMap: transforms a ‘container’ by promoting each of its
element as a container then reducing the resulting containers
to a single type
16. A monad is defined as a Scala trait for a container type M
The monadic implementation of a kernel function of type KF consists
of overriding the map and flatMap transformation.
Monadic composition
17. Monadic composition with parameterization
The computation of the parameterized kernel function Kλ(x, y)
is broken down into three sequential steps
1. Execution of similarity function h (map)
2. Application of the parameter λ (flatMap)
3. Execution of the exponential map g (flatMap)
The sequence is generated by applying a flatMap method to
the output to a map or another flatMap monadic operations.
18. 𝐾𝜆 𝐱, 𝐲 = 𝑔 𝜆
i
h(xi, yi)a
b
b
Monadic composition of parameterized kernels
a
b
a
Implementation of monadic composition using the for
comprehensive loop notation
20. Brief introduction to genetic algorithms
Genetic algorithms use reproduction to evolve a population
of solutions to a problem.
The components of a genetic algorithm are:
• Genetic encoding (and decoding): Conversion between a
solution and a binary format (bits, string) known as
chromosome
• Genetic operations: Functions that extract the most
genetically fit solution
• Genetic fitness function: Criteria to evaluate each
solution.
21. Reproduction cycle
The reproduction cycle controls the population of chromosomes
using three genetic operators:
• Selection: ranks chromosomes according to a fitness function
• Crossover: pairs chromosomes to generate offspring
chromosomes
• Mutation: introduces minor alteration in the genetic code
22. Genetic crossover
The purpose of the genetic crossover is to expand the current
population of chromosomes in order to intensify the
competition and improve the genetic quality of the population
The offspring chromosomes are added to the population
along with their parent to increase genetic diversity
23. Genetic mutation
The mutation procedure inserts a small variation in a
chromosome to maintain some level of diversity between
generations
The mutated chromosome is added to the population along
with the original.
24. aa a
01 011 10110001010101011100010111010
0 nh ng nλ
Kernel function: encoding
𝐾𝜆 𝐱, 𝐲 = 𝑔 𝜆
i
h(xi, yi)
Here an example of flat encoding for a parameterized kernel
function.
Note: A tree representation of the kernel function (Genetic
programming) is a viable alternative.
25. Kernel function: cross-over 1
01 011 111010110101100101010Kλ 𝐱, 𝐲 = gλ
i
h(xi, yi)
𝐾𝛽
′
𝐱, 𝐲 = 𝑔 𝛽
′
i
ℎ′(xi, yi) 11 001 111010011010100001011
01 001
11 011𝐾𝜆
(2)
𝐱, 𝐲 = 𝑔 𝜆
(2)
i
ℎ′
(xi, yi)
h
h’
111010011010100001011
111010110101100101010
h
h’
λ
λ
β
β
𝐾β
(1)
𝐱, 𝐲 = 𝑔β
(1)
i
ℎ (xi, yi)
Parents
Off springs
g
g’
g’
g
Genetic cross-over indexed on the exponential map.
26. Kernel function: cross-over 2
01 011 111010110101100101010Kλ 𝐱, 𝐲 = gλ
i
h(xi, yi)
𝐾𝛽
′
𝐱, 𝐲 = 𝑔 𝛽
′
i
ℎ′(xi, yi) 11 001 111010011010100001011
01 011
11 001𝐾𝛽′
(2)
𝐱, 𝐲 = g 𝜷′
i
ℎ′
(xi, yi)
h
h’
111010011010100001011
111010110101100101010
h
h’
λ
β
β’
𝐾𝜆′
(1)
𝐱, 𝐲 = g 𝝀′
i
ℎ (xi, yi)
Parents
Off springs
g
g’
g’
g
Genetic cross-over indexed on the lambda parameter
λ’
27. Kernel function: mutation
01 011 111010110101100101010Kλ 𝐱, 𝐲 = gλ
i
h(xi, yi)
h λg
01 011 111010111101100101010Kλ 𝐱, 𝐲 = g 𝝀′
i
h(xi, yi)
h λ'g
Original
Mutated
One-bit XOR genetic mutation indexed on the lambda parameter
28. ℒ 𝐾 𝒘 =
𝑖
𝑦𝑖 − 𝑓(𝑥𝑖, 𝒘 )2
+ 𝛾 𝒘 2
Kernel function: fitness
Given a kernel function Kλ and a training set {xi}, a classifier
model w is generated by minimizing the loss Lk
The fitness of a kernel function is the F1 score over a new
validation set.
29. Scala: Encoder
Encoder for a Kernel function. The method apply (resp. unapply)
implements the encoding (resp. decoding) algorithm
Generic encoder for parameterized type
30. Scala: encoding
Encoding for a Kernel function kf, given an implicit quantization
scheme quant, for the lambda parameter, λ
Conversion of the similarity function h, parameter λ and
exponential map g into a bit stream.
31. Scala: Decoding
Conversion of bits representation of similarity function h,
lambda parameter λ and exponential map g.
Assembling of bit stream to instantiate a new kernel function
35. Conclusion
The monadic composition and genetic encoding of
kernel functions allows analytical engine to adaptive
classification and prediction models to online
training
The approach selects the kernel that is the most
appropriate to new batches of labeled observations.
36. References
• Machine Learning: A Probabilistic Perspective §14.1 Kernels
Introduction K. Murphy – MIT Press 2012
• Genetic Algorithms in Search, Optimization and Machine
Learning D. Goldberg - Addison-Wesley 1989
• Scala for Machine Learning §10 Genetic Algorithms P. Nicolas -
Packt publishing 2014
• Introduction to Genetic Algorithms -§Scaling of Relative Fitness
E.D Goodman, Michigan State University 2009 World Summit on
Genetic and Evolutionary Computation
• Introduction to Evolutionary Computing §2 What is an
Evolutionary Algorithm? A. Eiben, J.E. Smith – Springer 2003
Editor's Notes
Context of the presentation:
The transition from Java and Python to Scala is not that easy: It goes beyond selecting Scala for its obvious benefits.- support functional concepts- leverage open source libraries and framework if needed- fast, distributed enough to handle large data setsScala was the most logical choice.
Scientific programming may very well involved different roles in a project:
Mathematicians for formulas
Data scientists for data processing and modeling
Software engineering for implementation
Dev. Ops and performance engineers for deployment in productionIn order to ease the pain, we tend to to learn/adopt Scala incrementally within a development team.. The problem is that you end up with an inconsistent code base with different levels of quality and the team developed a somewhat negative attitude toward the language.The solution is to select a list of problems or roadblocks (in our case Machine learning) and compare the solution in Scala with Java, Python ... (You sell the outcome not the process).PresentationA set of diverse Scala features or constructs the entire team agreed that Scala is a far better solution than Python or Java.