2017: Prototype-based models in unsupervised and supervised machine learning

Michael Biehl, Aleke Nolte
Johann Bernoulli Institute for
Mathematics and Computer Science
University of Groningen, NL
SUNDIAL H2020 Network
www.cs.rug.nl/~biehl
www.astro.rug.nl/~sundial/
pre- reprints, available code
Prototype-based models in unsupervised
and supervised machine learning
Lingyu Wang
Kapteyn Astronomical Inst.
and SRON Groningen
Astrophysics Science Group
Groningen, NL

Astroinformatics, Cape Town, November 2017 2
Overview
Introduction / Motivation
prototypes, exemplars
neural activation / learning
Supervised Learning
Learning Vector Quantization (LVQ)
Adaptive distances and relevance learning
Unsupervised Learning
Vector Quantization (VQ), competitive learning
Kohonen’s Self-Organizing Map (SOM)
Illustration:
SOM-clustering of galaxy data, post-labelling
Supervised classification, LVQ+relevance learning

Astroinformatics, Cape Town, November 2017
Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
machine learning: prototype- (and distance-) based systems
- easy to implement, highly flexible, online training
- white box: parameterization in the space of observed data
- yield interpretable classifiers/regression systems
- help to detect bias in training data, other artifacts
- provide insights into data set / problem at hand
Accuracy is not enough! [Paulo Lisboa]

4
Introduction
neural interpretation: activation and learning in a shallow network
external stimulus to a network of neurons
response according to weights (= expected inputs)
activation: BMU - best matching unit (and neighbors)
learning -> even stronger response to the same stimulus in future
weights represent different expected stimuli (prototypes)

based on dis-similarity/distance measure
assignment to prototypes: e.g. Nearest Prototype Scheme
given vector xμ , determine winner
(BMU)
→ assign xμ to prototype w*
most popular example: (squared) Euclidean distance
Vector Quantization (VQ)
VQ system: set of prototypes
data: set of feature vectors
Vector Quantization: identify typical representatives of data
which capture essential features

random sequential (repeated) presentation of data
… the Winner Takes it All (WTA):
initially: randomized wk, e.g. in randomly selected data points
Competitive Learning
η (<1): learning rate, step size of update
competitive VQ: competition without neighborhood cooperativeness
stochastic gradient descent minimization of the
Quantization Error
(here: sq. Euclidean)

Self-Organizing Map (SOM)
T. Kohonen. Self-Organizing Maps (Springer 1995, 1997, 2001)
neighborhood cooperativeness on a pre-defined low-dim. lattice
d-dim. lattice A of
neurons (prototypes)
- update BMU and lattice neighborhood:
where
range ρ w.r.t. distances in lattice A
upon presentation of xμ :
- determine the Best Matching Unit
at position s in the lattice

prototype lattice deforms, reflecting the density of observations
© Wikipedia
SOM: provides topology/neighborhood preserving
low-dimensional representation
e.g. for inspection and visualization of structured datasets
Frequently:
unsupervised analysis, post-hoc comparison with classes of data
Self-Organizing Map

Hubble’s galaxy classification scheme
http://astro.physics.uiowa.edu/ITU/labs/general-astronomy/counting-galaxies/part-1-counting-galaxies.html
Illustration: Galaxy Characteristics

10
11
12
.
.
41
Illustration: Galaxy Characteristics
Numerical features describing a catalogue of galaxies
work in progress - details not (yet) disclosed
GAMA: Galaxy and
Mass Assembly Survey
www.gama-survey.org
reduced
set of 10
selected
features
full set
of 41
features
(semi-major)
(semi-minor)
logistic normalization:

class 1
class 3
class 4
7
class 5
class 6
class 2 8,9
1 - elliptical E0-E6
3 – “early type spirals”
4 – “early type barred spirals”
5 – “intermediate type spirals”
6 – “intermediate type, barred”
7 – “late type spirals & irregulars”
Illustration: Galaxy Classification
2 - Little Blue Spheroids (LBS)
“
8,9 – artefacts, stars
Kelvin et al., MNRAS 439: 1245-1269, 2014.

SOM: (rectangular grid, ‘medium size’)
unsupervised clustering
based on 10 manually selected features
data set of ~ 5000 samples
post-labelling of prototypes
(majority of represented samples)
according to human classification
note: map with p.b.c. (toroidal)
Self-Organizing Map
SOM toolbox:
http://www.cis.hut.fi/somtoolbox/
init:lininit, training:long, hape:toroid, mapsize:regular, lattice:rect,
2
2
2
2
2
2
2
7
7
7
7
7
5
5
5
5
5
5
5
3
1
1
1
1
1
1
1
2
2
2
2
2
2
7
7
2
7
5
5
5
5
5
5
1
3
3
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
7
7
5
5
5
5
5
3
3
3
1
3
1
1
1
1
1
2
2
2
2
2
2
2
2
2
7
1
7
1
5
1
1
3
3
3
3
3
3
1
1
1
1
2
2
2
2
2
7
2
7
7
7
7
7
3
5
5
3
3
3
3
3
3
3
3
3
1
1
7
7
2
2
7
2
7
7
7
7
7
7
5
7
5
5
5
3
3
3
3
3
3
3
3
7
5
2
2
2
7
7
7
7
7
7
5
5
5
5
5
5
5
3
3
3
3
3
3
3
3
2
1
2
2
7
7
7
7
7
7
7
5
5
5
5
5
5
5
3
3
3
3
3
3
3
1
5
2
7
2
2
7
7
7
7
7
5
5
5
5
5
5
5
5
5
3
1
1
1
3
1
5
5
7
2
7
2
2
7
7
7
7
5
5
5
5
5
5
5
5
3
3
1
1
1
3
3
3
1
2
2
2
2
2
7
7
7
7
5
5
5
5
5
5
5
5
6
6
3
1
1
1
1
1
1
2
2
2
2
2
2
7
7
7
7
7
5
7
5
5
5
5
5
6
3
1
1
1
1
1
2
1
2
2
2
2
2
7
7
7
7
7
5
5
5
5
5
5
3
3
1
1
1
1
1
1
datadim: 10, normalizaton: logistic , size: regular , features: ExcelVarslct

SOM (rectangular grid, ‘medium size’)
unsupervised clustering
pie-charts:
percentage at which classes
are assigned to a particular unit
Self-Organizing Map
observations / suggestions:
- LBS appear well separated
- overlap of 1 / 3 and 5 / 7
with smooth transtions
- 6 and 5 mix/overlap
- “small classes” 4,8,9 hardly
represented
to do: inspect prototypes, U-matrix, ...
meta-clustering

∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Supervised Competitive Learning
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
Learning Vector Quantization here: heuristic LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
Alternatives: cost function based training
e.g. Generalized LVQ [ GLVQ: Sato and Yamada, 1995]

∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )


Nearest Prototype Classifier

Distance Measures
fixed distance measures:
- select distance measures (prior knowledge, pre-processing)
- compare performance of various measures
relevance learning: adaptive distance measures
- fix only parametric form of distance measure
- data driven adaptation:
determine prototypes and distance parameters
in the same training process (e.g. cost function based GLVQ)
Example: Generalized Matrix Relevance LVQ
(Adaptive)
[Schneider, Biehl, Hammer, 2009]

Generalized Relevance Matrix LVQ (GMLVQ)
adaptive quadratic distance in LVQ:
normalization:
summarizes
- the contribution of the original dimension j
- relevance of original features for the classification
standard (squared) Euclidean distance for
linearly transformed features
: relevance of pairs (i,j) of features

- restriction to classes with significant number of samples
- sub-sampling in order to achieve balanced training sets (5×743)
- use of all 41 features
- avgerages over random splits in 90% training, 10% test set
GMLVQ analysis
one prototype
per class
1 2
3 5
7
confusion matrix of the NPC
61.3 10.4 20.1 7.5 0.7
3.1 90.5 0 1.9 4.5
16.5 1.7 68.0 13.6 0.2
1.6 7.8 10.0 73.6 7.0
1.3 13.0 0.3 13.8 71.
6
predicted
trueclass

diagonal of the relevance matrix:
continuous weights
- alternative set of features ?
projection of the data set on leading
eigenvectors of Λ: discriminative
low-dim. representation:
e.g. strong overlap of classes 1 / 3
(elliptical / early type spirals)
- agrees only partially
with hand-crafted set ()
correlations between features?
GMLVQ analysis

Summary
Prototype-based systems in machine learning:
represent data in terms of exemplars, white box
parameterization of clustering / classification / regression
data reduction, vector quantization, clustering
low-dimensional representation, topology preserving SOM
Supervised Learning
example: LVQ for classification with adaptive distance
Generalized Matrix Relevance LVQ (GMLVQ) *
white box, transparent, intuitive, powerful
accuracy is not enough: insight into problem / data set
e.g. with respect to feature selection / weighting
* GMLVQ (matlab) toolboxes: www.cs.rug.nl/~biehl

Neural Gas (NG)
Generative Topographic Map (GTM)
Relevance learning in dimension reduction
Regression
Ordinal Regresssion in GMVLQ
Radial Basis Function networks (RBF)
Probabilistic classification
likelihood-based classifiers (Robust Soft LVQ)
Distances / Similarities
unconventional, problem-specific similarity measures
e.g. functional data (time series, spectra, histograms...)
non-vectorial data, relational data
relevances: weak/strong, bounds
...
there is a lot more...

review: WIRES Cognitive Science (2016)

2017: Prototype-based models in unsupervised and supervised machine learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2017: Prototype-based models in unsupervised and supervised machine learning

Similar to 2017: Prototype-based models in unsupervised and supervised machine learning (20)

More from University of Groningen

More from University of Groningen (17)

Recently uploaded

Recently uploaded (20)

2017: Prototype-based models in unsupervised and supervised machine learning

Editor's Notes