Learning, Representations, Generative modelling

/ 56
Generativity and Data Science
1
Akın Kazakçı (Mines Paristech)
akin.kazakci@mines-paristech.fr

2
Prediction
A quest for artiﬁcial general intelligence

3
Prediction Novelty generation
A quest for artiﬁcial general intelligence

/ 56
Plan
• Representation and learning
• Impact of representations on generation
• A naive bayes generator
• Generating novelty with neural networks
4

/ 56
How do you
represent this
“object”?
5
Representing an object
A

/ 56
What is a
representation?
6
A

/ 567
A
Source: https://plato.stanford.edu/entries/
mental-representation/
Have fun!
Very hard question.

/ 568
A
Even the simplest of (computer) representations
require some choice or arbitrariness.
AObject (?) Pixel representation (8x9)

/ 569
To process any “object” (e.g. by computers)
you need to “represent” it.

/ 5610
Representation are not “neutral” and
“independent” of the observer.

/ 56
What effect a representation have
on learning and (or) generation?

/ 56
Let us take a toy
example; 16 letters
Each represented
with 9x5 binary
pixels
12
Representation and Learning

/ 56
What is the size of
the representation
space?
13

/ 56
There are 245 object
(representations) in
that space.
14

/ 56
Each image can be
represented as a vector of
size 45.
15
(0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, 0,0 …………………..…… 0,0,0,0,0)
(0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, 0,1 …………………..…… 0,0,0,0,0)
.
.
.
.
.
.
.
.
.
.
.
.
.
.

/ 56
What can be learned
from this data?
16
(0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, 0,0 …………………..…… 0,0,0,0,0)
(0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, 0,1 …………………..…… 0,0,0,0,0)
.
.
.
.
.
.
.
.
.
.
.
.
.
.

/ 56
The ﬁrst and the simplest thing
we can ‘consider’ learning is the
data itself (e.g. identity function).
17
(0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, 0,0 …………………..…… 0,0,0,0,0)
(0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, 0,1 …………………..…… 0,0,0,0,0)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The identity function can be
conceived as a mapping from the
image to its vectorised
representation.
What would be the inconvenient of
such a function?

/ 5618
A
The new ‘A’ will be
unrecognisable by Id(x)

/ 5619
Learning assumes a notion of
‘generalisation’

/ 5620
What generalisations can we
learn from these letters ?
Here is an example:

/ 5621
Learning means ﬁnding
regularities or structures in
representations.
Where structure is the
dependence of pixels (e.g.
co-occurence)

/ 5622
Re-representing objects
We can use the following
structures to re-represent letters.
For example,

/ 56
Objects can be represented in
multiple ways
In design literature, it has been acknowledged that
objects can be represented in multiple ways
Images from Reich, A ciritcal review of general design theory, RED, 1995

/ 5624
Re-representing objects

/ 56
Strokes gives us a shorter
representation (or code) for the
letter domain.
25

/ 56
This new representation is
lossless compression (no
information lost)
26

/ 56
Each representation is a model
And the set of models of a given
agent is its knowledge.
27

/ 56
Models can be imperfect
(e.g. lossy).
28

/ 56
One example of imperfect
models are probabilistic models.
29

/ 56
Plan
• Impact of representations on (probabilistic)
generation
30

/ 56
We generate pixels
randomly uniformly.
Everything seems new.
But it is hard to ﬁnd any
structure in this data.
31
The impact of representations on novelty
generation

/ 5632
Pixel space Stroke space
Representations change what you can
generate

/ 56
Size of the representation vs Size of
Novelty Set
33
0
500
1000
1500
2000
2500
0 2 4 6 8 10 12
0
5E+12
1E+13
1,5E+13
2E+13
2,5E+13
3E+13
3,5E+13
4E+13
4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
0
200000000
400000000
600000000
800000000
1E+09
1,2E+09
0 5 10 15 20 25 30 35
0
200000
400000
600000
800000
1000000
1200000
0 5 10 15 20 25

/ 56
Plan
• Impact of representations on (probabilistic) generation
34

/ 5635
Assume we have 60K handwritten digits.
How to model them?

/ 56
Modeling digits
36
The main idea is to treat digits as a probability
distribution over the image space.
Source: Umesh Vazirani

/ 5637
Probabilistic approach
• Treat variations among images of a digit as a probability
distribution over all the images x
✴ Distribution Pj(x) generates images x from digit j, but by (small)
random chance can look like other digits
✴ Imperfect model, represents our uncertainty/ambiguity, but Bayes’
rule to the rescue!
P1(x) =
(
P2(x) =
(
P8(x) =
(
· · ·
)
· · ·
)
· · ·
)

/ 5638
Estimating the distributions
• Use training data to estimate the prior and the
class-conditional distributions
✴ MNIST dataset: 60K training data, 10K test data
• Estimating the is easy:
• From MNIST:
• But estimating the is difﬁcult!
⇡j = Pr[y = j]
Pj(x) = P(x | y = j)
⇡j
Pj(x)
ˆ⇡j =
nj
n
=
# of examples of class j
total # of examples
j 0 1 2 3 4 5 6 7 8 9
ˆ⇡j (%) 9.87 11.24 9.93 10.22 9.74 9.03 9.86 10.44 9.75 9.92
Naive Bayes
• Convert grayscale images to binary
• A general distribution over has parameters
threshold binary data
x 2 {0, 1}784
{0, 1}784
2784
1
• Assume that within each class, the individual pixel values
are independent:
• Each is a coin ﬂip: easy to estimate!
• Now only have 784 parameters to learn
Pj(x) = Pj1(x1) · Pj2(x2) · · · Pj,784(x784)
Pji

/ 5639
Naive Bayes on MNIST
• Error rate: 15.4% (on 10K test data) —> pretty good!
• Mean vectors for each class (the ’s):pji
• Samples from the trained model:

/ 56
Plan
• Impact of representations on generation
40

/ 5641
Machine learning proposes powerful
generative models

/ 5642
but these powerful models are used
to regenerate objects that we can
relate easily to known objects…

/ 56
• Although trained for
generating what we know,
some models can generate
unrecognizable objects
• However, these models and
samples are considered as
spurious (Bengio et al. 2013),
or as a failure(Salimans et al.
2016)
43

/ 56
So, ﬁrst task is to demonstrate that
there is a lot more generative potential
in DNNs than what was intended
44

/ 56
Deep Neural Networks
45
Learning a sequence of
transformations of the
original representation
Main advantages
• Compositionality
• Hierarchy

Auto-associative neural nets
a.k.a auto-encoders
Learning to
disassemble
Learning to
build
- Auto-encoders have existed for
long time (Kramer 1991)
- Deep variants are more recent
(Hinton, Salakhutdinov, 2006;
Bengio 2009)
- A deep auto-encoder learns
successive transformations that
decompose and then
recompose a set of training
objects
- The depth allows learning a
hierarchy of transformations

/ 56
• We use an iterative method to generate new images
• Start with a random image
• Force the network to construct (i.e. interpret)
• , until convergence, f(x) = dec(enc(x))
47
The generative process

/ 56
Generation to get back to knowledge
48

/ 5649
A map of the known digit instances
Kazakçı, Cherti, Kégl, 2016

/ 5650

/ 5651

/ 5652

/ 5653Kazakçı, Cherti, Kégl, 2016

/ 5654
Known Training digits
Representable “Combinations of strokes”
54
What is the target? - Value referentials
Our interpretation
of the results:

/ 5655
Representable All digits that the model can generate
Valuable All recognizable digits
What traditional ML wants

/ 5656
Representable “Combinations of strokes”
Valuable Human selection
What novelty generation aims at

Learning, Representations, Generative modelling

More Related Content

What's hot

Similar to Learning, Representations, Generative modelling

More from Akin Osman Kazakci

Recently uploaded

Learning, Representations, Generative modelling