NYAI #9: Concepts and Questions As Programs by Brenden Lake

Concepts and questions as programs
Brenden Lake
New York University

scene understanding
language acquisition
creativity
general purpose
problem solving
commonsense reasoning
language understanding
human abilities that
elude the best algorithms
concept learning question asking

Outline
case study 1:
handwritten characters
case study 2:
recursive visual concepts
Concept learning
Question asking
... Are any ships 3 tiles long?
case study 3:
question asking in simple games
Are the blue ship and red ship
parallel?

Concepts and questions as programs,
learning as program induction
...
Are any objects 3 tiles long?
( >
( +
( map
( lambda
x
( =
( size x )
3
)
)
( set Blue Red Purple )
)
)
0
)
G GG
angle = 120
start = F
niter = 3
F F G + G + G F
Are the blue and red
objects parallel?
( =
( orient Blue )
( orient Red )
)
sij P(sij|si(j 1))
end for
Ri P(Ri|S1, ..., Si 1)
end for
{, R, S}
return @GENERATETOKEN( )
end procedure
procedure GENERATETOKEN( )
for i = 1... do
S
(m)
i P(S
(m)
i |Si)
L
(m)
i P(L
(m)
i |Ri, T
(m)
1 , ..., T
(m)
i 1 )
T
(m)
i f(L
(m)
i , S
(m)
i )
end for
A(m)
P(A(m)
)
I(m)
P(I(m)
|T(m)
, A(m)
)
return I(m)
end procedure
1

...
Three cognitive principles
compositionality
causality
learning-to-learn

Outline
case study 1:
case study 2:
fractal concepts
Concept learning
Question asking
...
Are any ships 3 tiles long?
active learning with rich questions
parallel?

Outline
case study 1:
case study 2:
fractal concepts
Concept learning
Question asking
...
parallel?
Josh Tenenbaum Russ
Salakhutdinov
Lake, Salakhutdinov, & Tenenbaum
(2015). Science.

where are the others?
generating
new examples
generating
new concepts
How do people learn such rich concepts from very little data?
parsing
the speed of learning the richness of representation
“one-shot
learning”
(e.g., Carey & Bartlett, 1978;
Markman, 1989;
Tenenbaum, 1999;
Bloom, 2000;
Smith et al., 2002)

Training data (ImageNet)
• 1.2 million images
• ~1000 images per
category
Architecture:
• dozens of layers
• millions of parameters
Concept learning in computer vision:
deep neural networks and big data
input
output:
“daisy”
layers of feature maps

We would like to study one-shot learning in a domain with…
1) Natural, high-dimensional concepts.
2) A reasonable chance of building models that can see most of
the structure that people see.
3) Insights that generalize across domains.
A testbed domain for one-shot learning

1600+ concepts
20 examples each
Omniglot stimulus set
(https://github.com/brendenlake/omniglot)

where are the others?
generating
new examples
generating
new concepts
human-level concept learning
parsing
“one-shot
learning”

generating
new examples
generating
new concepts
parsing
human-level concept learning
classiﬁcation

1 2 1 2 1 2 1 2 1
2
1 2 1 2 1 2 1
2 1
12
1 2
12 1 2 1 2
people drawing a new characterHuman drawings

m a set of discrete primitive actions learned
m the background set (Fig. 3A, i), such that
probability of the next action depends on
parameterized curves (splines) by sampling the
control points and scale parameters for each
subpart. Last, parts are roughly positioned to
76
1 2 3 4 5stroke order:
e distinguished
stroke and an
e five best pro-
bability scores
ew test images
ight) is shown
tion score (log
as black dots.
hown with their
t).
Original
Human drawings

Original
Human drawings
m a set of discrete primitive actions learned
m the background set (Fig. 3A, i), such that
probability of the next action depends on
parameterized curves (splines) by sampling the
control points and scale parameters for each
subpart. Last, parts are roughly positioned to
76
1 2 3 4 5stroke order:
e distinguished
stroke and an
e five best pro-
bability scores
ew test images
ight) is shown
tion score (log
as black dots.
hown with their
t).

...
relation:!
attached along
relation:!
attached along
relation:!
attached at start
exemplars
raw data
object template
parts
sub-parts
primitives
(1D curvelets, 2D
patches, 3D geons,
actions, sounds, etc.)
type level
token level
inference
✓
I
latent program
raw binary image
Bayes’ rule
renderer prior on parts,
relations, etc.
P(✓|I) =
P(I|✓)P(✓)
P(I)
Bayesian Program Learning (BPL)

...
relation:!
attached along
relation:!
attached along
relation:!
attached at start
exemplars
raw data
object template
parts
sub-parts
primitives
(1D curvelets, 2D
patches, 3D geons,
type level
token level
inference
✓
I
latent program
raw binary image
Bayes’ rule
renderer prior on parts,
relations, etc.
P(✓|I) =
P(I|✓)P(✓)
P(I)
Key ingredients for learning
good programs:
compositionality
causality
learning-to-learn

Principle 1: Compositionality
Parts:
motor powers wheels
handlebars on post
wheels below platform
Relations:
supports
(e.g., Winston, 1975; Fodor, 1975; Marr & Nishihara, 1978; Biederman, 1987)
Building complex representations from simpler parts/primitives.

wheels
Parts:
motor powers wheels
handlebars on post
Relations:
supports

handlebars
wheels
Parts:
motor powers wheels
handlebars on post
Relations:
supports

handlebars
posts
wheels
Parts:
motor powers wheels
handlebars on post
Relations:
supports

handlebars
posts
seats
wheels
Parts:
motor powers wheels
handlebars on post
Relations:
supports

handlebars
posts
seats
motors
wheels
Parts:
motor powers wheels
handlebars on post
Relations:
supports

Segway
One-shot learning
handlebars
posts
seats
motors
wheels
Parts:
motor powers wheels
handlebars on post
Relations:
supports

Principle 2: Causality
Representing hypothetical real-world processes that produce perceptual
observations.
(analysis-by-synthesis; intuitive theories; concepts as causal explanations)
same causal process different examples
(e.g., computer science: Revow et al, 1996; Hinton & Nair, 2006; Freyd, 1983; cog. psychology and cog.
neuroscience; Freyd, 1983; Loncamp et al., 2003; James & Gauthier, 2006, 2009)

observations.
…
Is it growing too
close to my house?
How will it grow if I trim it?

observations.
an airplane is parked on the
tarmac at an airport
a group of people standing on
top of a beach
ng a horse on a
road
…
Is it growing too
close to my house?

observations.
an airplane is parked on the
tarmac at an airport
a group of people standing on
top of a beach
ng a horse on a
road
…
Is it growing too
close to my house?
Machine caption generation:
“A group of people standing on top of a beach”

1 2 3 4 5 6 7 8 9 10
0
1000
2000
3000
4000
5000
Number of strokes
frequency
number of strokes 1 2
3
1 2
3
3
≥ 31 2
Start position for strokes in each positionstroke start positions
Stroke
global transformations
relations between strokes
1
2
12
1
2
1
2
independent (34%) attached at start (5%) attached at end (11%) attached along (50%)
Principle 3: Learning-to-learn
stroke primitives
Experience with previous concepts helps for learning new concepts.
(e.g., Harlow, 1949; Schyns, Goldstone, & Thibaut, 1998; Smith et al., 2002)

...
relation:!
attached along
relation:!
attached along
relation:!
attached at start
exemplars
raw data
object template
parts
sub-parts
primitives
(1D curvelets, 2D
patches, 3D geons,
type level
token level
...
relation:!
attached at start
B
https://github.com/brendenlake/BPL

...
relation:!
attached along
relation:!
attached along
relation:!
attached at start
exemplars
raw data
object template
parts
sub-parts
primitives
(1D curvelets, 2D
patches, 3D geons,
type level
token level
relation:!
attached at start

generating
new examples
generating
new concepts
parsing
Large-scale behavioral experiments to evaluate the model

“Generate a new example”
Which grid is produced by the model?
A B A B
A B A B

A B A B
A B A B
“Generate a new example”
Which grid is produced by the model?

lesioned
learning-to-learnBPLExample
Lesion analysis of key principles
lesioned
compositionality

Standard
evaluation paradigm
Computational
model
Behavioral
experiment
compare behavior

Novel, large-scale,
reverse engineering paradigm
Behavioral data
(Omniglot)
Computational
model
Behavioral
experiment
(drawing
new examples)
visual Turing test
(behavioral experiment)
simulated
behavior
human
behavior
Standard
evaluation paradigm
Computational
model
Behavioral
experiment
compare behavior

Novel, large-scale,
reverse engineering paradigm
Behavioral data
(Omniglot)
Computational
model
Behavioral
experiment
(drawing
new examples)
visual Turing test
(behavioral experiment)
simulated
behavior
human
behavior
Standard
evaluation paradigm
Computational
model
Behavioral
experiment
compare behavior
repeated for each alternative model

Experimental design
• Participants (judges) on Amazon Mechanical Turk (N = 147)
• Each judge saw behavior from only one algorithm
• Instructions: Computer program that simulates how people
draw a new example. Can you tell humans from machines?
• Pre-experiment comprehension tests
• 49 trials with accuracy displayed after each block of 10.

new exemplarsnew exemplars (dynamic)new concepts (from type)new concepts (unconstrained)
40
45
50
55
60
65
70
75
80
85Identiﬁcation(ID)Level
(%judgeswhocorrectlyIDmachinevs.human)
Generating
new
exem
plars
Indistinguishable
BPL Lesion (no compositionality)
BPL
BPL Lesion (no learning-to-learn)
Bayesian Program Learning models
Visual Turing Test — Generating new examples
error bars ± 1 SEM

One-shot classiﬁcation performance
Errorrate(%)
0
5
10
15
20
25
30
35
People
BPL
Deep Siamese Convnet
(Koch et al., 2015)
Hierarchical Deep
Deep Convnet
Bayesian Program Learning models Deep neural networks
After all models pre-trained on 30 alphabets of characters.
(no causality)

Generating new
concepts
(unconstrained)
Alphabet
Human or Machine?
Generating new concepts
(from type)
Alphabet
Human or Machine?
Human or Machine?
Human or Machine?
Generating new examples
(dynamic)
Human
or Machine?
More large-scale behavioral experiments to evaluate BPL model

new exemplarsnew exemplars (dynamic)new concepts (from type)new concepts (unconstrained)
40
45
50
55
60
65
70
75
80
85
Identiﬁcation(ID)Level
(%judgeswhocorrectlyIDmachinevs.human)
Generating
new
exem
plars
Generating
new
exem
plars (dynam
ic)
Generating
new
concepts (from
type)
Generating
new
concepts (unconstrained)
Indistinguishable
BPL
Bayesian Program Learning models
Visual Turing Tests
http://cims.nyu.edu/~brenden/supplemental/turingtests/turingtests.html
https://github.com/brendenlake/visual-turing-tests

• Simple visual concepts with real-world complexity.
• Computational model that embodies three principles —
compositionality, causality, and learning-to-learn — supporting rich
concept learning from very few examples
• Through large-scale, multi-layered behavioral evaluations, the
model’s creative generalizations are difﬁcult to distinguish from
human behavior
• Current and future directions include understanding developmental
and neural mechanisms.
Interim conclusions: Case study 1

Outline
case study 1:
Concept learning
Question asking
...
parallel?
case study 2:

Outline
case study 1:
Concept learning
Question asking
...
parallel?
case study 2:
Steve Piantadosi

If the mind can infer compositional, causal
programs from their outputs— what are the limits?

What is another example
of the same species?
causal knowledge inﬂuences perception and extrapolation

What is another example
of the same species?
causal knowledge inﬂuences perception and extrapolation
Angle: 35 degrees
Start symbol: F+FG
F ➔ C0FF-[C1-F+F]+[C2+F-F]G
G ➔ C0FF+[C1+F]+[C3-F]
more similar according to L-system program more similar according to deep neural network

Before infection After infection
A surface was infected with a new type of alien crystal.
The crystal has been growing for some time.
A B C
What do you think the crystal will look like if you let it grow longer?

L-system (Lindenmayer, 1968)
Angle: 120 degrees
Start symbol: F
F ➔ F-G+F+G-F
G ➔ GG
Legend
“+” : right turn
“-” : left turn
“F” : go straight
“G” : go straight
Iteration1Iteration2…
F-G+F+G-F
Image Dynamics Symbolic
F-G+F+G-F…
-GG+F-G+FG-F+GG-…
F-G+F+G-F
Compositional language for expressing causal processes

Experiment 1 - Classiﬁcation
• No feedback
• Six choices
• Distractors generated by replacing rule
(with new rule from grammar)
• Participants recruited on Amazon
Mechanical Turk in USA (N = 30)
• 24 different fractal concepts
“latent” condition
“stepwise” condition
Before infection Step 1 Step 2
Visual Recursion Task (VRT)
[Maurício Martins and colleagues]

Human performance
Results - Experiment 1 - Classiﬁcation
chance

Bayesian program learning
Meta-grammar
L-System
Image
renderer
I
Axiom = F
Angle =120
F ➔ F-G+F+G-F
G ➔ GG
depth = 2L
I
(context free)
d
Start ➔ XYZ
X ➔ F
G
Z ➔ F
G
‘’
Y ➔ F
G
YY
-Y+
+Y-
‘’
Angle ➔ 60
90
120
Axiom ➔ FM
L

Meta-grammar
L-System
Image
renderer
I
Axiom = F
Angle =120
F ➔ F-G+F+G-F
G ➔ GG
depth = 2L
I
(context free)
d
Start ➔ XYZ
X ➔ F
G
Z ➔ F
G
‘’
Y ➔ F
G
YY
-Y+
+Y-
‘’
Angle ➔ 60
90
120
Axiom ➔ F
Bayesian Inference
(MCMC algorithm)
Inference
P(L, d|I) / P(I|L, d)P(L)P(d)
M
L

Meta-grammar
L-System
Image
renderer
I
Axiom = F
Angle =120
F ➔ F-G+F+G-F
G ➔ GG
depth = 2L
I
(context free)
d
Start ➔ XYZ
X ➔ F
G
Z ➔ F
G
‘’
Y ➔ F
G
YY
-Y+
+Y-
‘’
Angle ➔ 60
90
120
Axiom ➔ F
Bayesian Inference
(MCMC algorithm)
Inference
P(L, d|I) / P(I|L, d)P(L)P(d)
M
L
Note
The model has several key advantage: it has
exactly the right programming language.
If people can infer programs like these, it’s because
their “language of thought” is general enough to
represent these causal descriptions, and many
more…

Perceptual similarity:
Pre-trained neural network for object recognition
cosine distance (in last hidden layer)
best match
worst match

Human performance
chance
Model performance
Bayesian program learning: 100%
Pre-trained neural network: 8.3%
neural net
program induction
Results - Experiment 1 - Classiﬁcation

Human performance
chance
Why is the model better than people?
A failure of search?
neural net
program induction (with limited MCMC)
Limited search (MCMC)
predicts which concepts are
easier to learn: r = 0.57
Easy
Hard
Rational process models
(Grifﬁths, Vul, Hamrick, Lieder, Goodman, etc.)

The “look for a smaller copy” heuristic
Iteration 2 Iteration 3 Iteration 2 Iteration 3

Experiment 2 - Generation
• Participants recruited on Amazon Mechanical Turk in
USA (n = 30)
• 13 different fractal concepts
(subset of previous experiment)
• No feedback
“latent” condition
“stepwise” condition
Before infection Step 1 Step 2 Step 3

individual decisions (clicks)
Results - Experiment 2 - Generation
precisely right exemplar
**
*
**
** p < 0.001
* p < 0.05
Always use “all” button

individual decisions (clicks)
Results - Experiment 2 - Generation
precisely right exemplar
**
*
**
** p < 0.001
* p < 0.05random
deep neural network
Always use “none” button
baselines:
Always use “all” button

What do you think the crystal will look like if you let it grow longer?
number
above image
indicates
frequency

Interim conclusions: Case study 2
•Explored very difﬁcult concept learning task
•Computational model that infers causal processes from
composition of primitives
•People generalized in ways consistent with model (and
inconsistent with other models), despite model’s substantial
advantages.
•Generation aided by having a sequence of examples, rather
than just one — A pattern the model does not fully explain.

Outline
case study 1:
case study 2:
fractal concepts
Concept learning
Question asking
...
parallel?
Todd GureckisAnselm Rothe
Rothe, Lake, & Gureckis (2016). Proceedings of
the 38th Annual Conference of the Cognitive
Science Society. (More content in prep.).

rich, human questions
active learning for people and machines
?

How does it
make
sound?
?

How does it
make
sound?
What is the
difference
between the
second and
the third?
?

How does it
make
sound?
What is the
difference
between the
second and
the third?
Which
features are
especially
important?
?

How does it
make
sound?
What is the
difference
between the
second and
the third?
Which
features are
especially
important?
simple, machine questions
?

How does it
make
sound?
What is the
difference
between the
second and
the third?
Which
features are
especially
important?
What is the
the category
label of this
object??

How does it
make
sound?
What is the
difference
between the
second and
the third?
Which
features are
especially
important?
What is the
the category
label of this
object?
What is the
the category
label of this
object?
?

How does it
make
sound?
What is the
difference
between the
second and
the third?
Which
features are
especially
important?
What is the
the category
label of this
object?
What is the
the category
label of this
object?
What is the
the category
label of this
object?
?

We need a task that frees people to ask rich questions, yet is
still amendable to formal (ideal observer) modeling.
A testbed domain for question asking

We need a task that frees people to ask rich questions, yet is
still amendable to formal (ideal observer) modeling.
A testbed domain for question asking
(Battleship task: Gureckis & Markant, 2009; Markant & Gureckis, 2012, 2014)

Experiment 1: Free-form question asking
11 Markant & Gureckis 2009
A B C D E F
1
2
3
4
5
6
Hidden gameboardps
random
samples
A B C D E F
1
2
3
4
5
6
Revealed gameboard
enerative model Current data/context
Identify the hidden
gameboard!
Goal
es
Ground truth
3 ships (blue, purple, red)
3 possible sizes (2-4 tiles)
1.6 million possible conﬁgurations

...
Phase 1: Sampling
A B C D E F
1
2
3
4
5
6
Hidden gameboardps
random
samples
A B C D E F
1
2
3
4
5
6
Revealed gameboard
Identify the hidden
gameboard!
Goal
es
Ground truth

...
Phase 1: Sampling
A B C D E F
1
2
3
4
5
6
Hidden gameboardps
random
samples
A B C D E F
1
2
3
4
5
6
Revealed gameboard
Identify the hidden
gameboard!
Goal
es
Ground truth
Phase 2: Question asking
Is the red ship horizontal?
Constraints
• one word answers
• no combinations

RESULTS
17
A B C D E F
1
2
3
4
5
6
Context Example questions
...
RESULTS
17
At what location is the top left part of the purple ship?
What is the location of one purple tile?
Is the blue ship horizontal?
Is the red ship 2 tiles long?
Is the purple ship horizontal?
Context Example questions
...
Game board Example Questions
Results: generated questions
x18 different game scenarios…

“How many squares long is the blue ship?”
“How long is the blue ship?”
“How many tiles is the blue ship?”
…
}shipsize(blue)
“Is the blue ship horizontal?”
“Does the blue one go from left to right?”
…
horizontal(blue)
...
}
Extracting semantics from free-form questions

y revealed game boards (see
e introduced participants to
letting them click on a pre-
hich are the past queries X
We chose this format of tile-
he warm-up phase, to give
s playing a game that was
bsequently, as a comprehen-
ked to indicate the possible
, whether the tile could be
he task would only continue
ectly (or a maximum of six
the following prompt: “If
ask any question about the
you ask?” (represented as x
ded participants’ responses.
at combinations of questions
wo questions together with
to be answerable with a sin-
ord, a number, true/false, or
ipants could not ask for the
ce, although their creativity
to practical limitations par-
n per trial, no feedback was
ing phase. We emphasized
sk questions as though they
ady had experience with in
produce a variety of different
N Location/standard queries
24 What color is at [row][column]?
24 Is there a ship at [row][column]?
31 Is there a [color incl water] tile at [row][column]?
Region queries
4 Is there any ship in row [row]?
9 Is there any part of the [color] ship in row [row]?
5 How many tiles in row [row] are occupied by ships?
1 Are there any ships in the bottom half of the grid?
10 Is there any ship in column [column]?
10 Is there any part of the [color] ship in column [column]?
3 Are all parts of the [color] ship in column [column]?
2 How many tiles in column [column] are occupied by ships?
1 Is any part of the [color] ship in the left half of the grid?
Ship size queries
185 How many tiles is the [color] ship?
71 Is the [color] ship [size] tiles long?
8 Is the [color] ship [size] or more tiles long?
5 How many ships are [size] tiles long?
8 Are any ships [size] tiles long?
2 Are all ships [size] tiles long?
2 Are all ships the same size?
2 Do the [color1] ship and the [color2] ship have the same size?
3 Is the [color1] ship longer than the [color2] ship?
3 How many tiles are occupied by ships?
Ship orientation queries
94 Is the [color] ship horizontal?
7 How many ships are horizontal?
3 Are there more horizontal ships than vertical ships?
1 Are all ships horizontal?
4 Are all ships vertical?
7 Are the [color1] ship and the [color2] ship parallel?
Adjacency queries
12 Do the [color1] ship and the [color2] ship touch?
6 Are any of the ships touching?
9 Does the [color] ship touch any other ship?
2 Does the [color] ship touch both other ships?
Demonstration queries
14 What is the location of one [color] tile?
28 At what location is the top left part of the [color] ship?
5 At what location is the bottom right part of the [color] ship?

Region queries
Ship size queries
Adjacency queries
Region queries
Ship size queries

Region queries
Ship size queries
Adjacency queries
t of tile-
to give
that was
mprehen-
possible
could be
continue
m of six
mpt: “If
bout the
nted as x
sponses.
uestions
her with
ith a sin-
/false, or
k for the
reativity
Ship size queries

81
Region queries
Ship size queries
Adjacency queries
continue
um of six
mpt: “If
about the
nted as x
esponses.
questions
ther with
with a sin-
e/false, or
sk for the
creativity
ions par-
back was
mphasized
ough they
e with in
different
m which
Ship size queries
Adjacency queries

A B C D E F
1
2
3
4
5
6
Trial4
Is the blue ship horizontal?
Is the red ship 2 tiles long?
Is the purple ship horizontal?
question options
Experiment 2: Evaluating questions for quality
ranked list
best
worst

People are very good at evaluating questions
RESULTS
30
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18
2
3
4
5
2
3
4
5
2
3
4
5
2
3
4
5
0 0.5 1 0 0.5 1 0 0.5 1
EIG (normalized)
Averagerankscore(Exp2)
Bayesian ideal observer score
Averagehumanrankscore(5=best)
average r = 0.82
across 18 different game scenarios
“Good questions rule out many hypotheses”

Region queries
Ship size queries
Adjacency queries
What principles and representations can explain the
productivity and creativity of question asking?
How do people think of a question to ask?

“Color at tile A1?”
( color A1 )
“Size of the blue ship?”
( size Blue )
“Orientation of the blue ship?”
( orient Blue )
Game primitives
...
question asking as program synthesis

(+ X X ) (= X X )
Primitive operators
( color A1 )
( size Blue )
( orient Blue )
Game primitives
...

(+ X X ) (= X X )
Primitive operators
“Are the blue ship and the red
ship parallel?”
( =
( orient Blue )
( orient Red )
)
“What is the total size
of all the ships?”
(+
(+
( size Blue )
( size Red )
)
( size Purple )
)
Novel questions
compositionality
( color A1 )
( size Blue )
( orient Blue )
Game primitives
...

(+ X X ) (= X X )
Primitive operators
“Are the blue ship and the red
ship parallel?”
( =
( orient Blue )
( orient Red )
)
“What is the total size
of all the ships?”
(+
(+
( size Blue )
( size Red )
)
( size Purple )
)
Novel questions
compositionality
( color A1 )
( size Blue )
( orient Blue )
Game primitives
...
learning-to-learn

How many ships are three tiles long?
( +
( map
( lambda
x
( =
( size x )
3
)
)
)
)
( >
( +
( map
( lambda
x
( =
( size x )
3
)
)
)
)
0
)
Are all ships three tiles long?
( =
( +
( map
( lambda
x
( =
( size x )
3
)
)
)
)
3
)
Compositionality in question structure

3 All questions
GROUP QUESTION FUNCTION EXPRESSION
location What color is at A1? location (color A1)
Is there a ship at A1? locationA (not (= (color A1) Water))
Is there a blue tile at A1? locationD (= (color A1) Blue)
segmentation Is there any ship in row 1? row (> (+ (map ( x (and (= (row x) 1) (not (= (color x) Water)))) (set A1 ... F6))) 0)
Is there any part of the blue ship in row 1? rowD (> (+ (map ( x (and (= (row x) 1) (= (color x) Blue))) (set A1 ... F6))) 0)
Are all parts of the blue ship in row 1? rowDL (> (+ (map ( x (and (= (row x) 1) (= (color x) Blue))) (set A1 ... F6))) 1)
How many tiles in row 1 are occupied by ships? rowNA (+ (map ( x (and (= (row x) 1) (not (= (color x) Water)))) (set A1 ... F6)))
Are there any ships in the bottom half of the grid? rowX2 ...
Is there any ship in column 1? col (> (+ (map ( x (and (= (col x) 1) (not (= (color x) Water)))) (set A1 ... F6))) 0)
Is there any part of the blue ship in column 1? colD (> (+ (map ( x (and (= (col x) 1) (= (color x) Blue))) (set A1 ... F6))) 0)
Are all parts of the blue ship in column 1? colDL (> (+ (map ( x (and (= (col x) 1) (= (color x) Blue))) (set A1 ... F6))) 1)
How many tiles in column 1 are occupied by ships? colNA (+ (map ( x (and (= (col x) 1) (not (= (color x) Water)))) (set A1 ... F6)))
Is any part of the blue ship in the left half of the grid? colX1 ...
ship size How many tiles is the blue ship? shipsize (size Blue)
Is the blue ship 3 tiles long? shipsizeD (= (size Blue) 3)
Is the blue ship 3 or more tiles long? shipsizeM (or (= (size Blue) 3) (> (size Blue) 3))
How many ships are 3 tiles long? shipsizeN (+ (map ( x (= (size x) 3)) (set Blue Red Purple)))
Are any ships 3 tiles long? shipsizeDA (> (+ (map ( x (= (size x) 3)) (set Blue Red Purple))) 0)
Are all ships 3 tiles long? shipsizeDL (= (+ (map ( x (= (size x) 3)) (set Blue Red Purple))) 3)
Are all ships the same size? shipsizeL (= (map ( x (size x)) (set Blue Red Purple)))
Do the blue ship and the red ship have the same size? shipsizeX1 (= (size Blue) (size Red))
Is the blue ship longer than the red ship? shipsizeX2 (> (size Blue) (size Red))
How many tiles are occupied by ships? totalshipsize (+ (map ( x (size x)) (set Blue Red Purple)))
orientation Is the blue ship horizontal? horizontal (= (orient Blue) H)
How many ships are horizontal? horizontalN (+ (map ( x (= (orient x) H) (set Blue Red Purple))))
Are there more horizontal ships than vertical ships? horizontalM (> (+ (map ( x (= (orient x) H) (set Blue Red Purple)))) 1)
Are all ships horizontal? horizontalL (= (+ (map ( x (= (orient x) H) (set Blue Red Purple)))) 3)
Are all ships vertical? verticalL (= (+ (map ( x (= (orient x) H) (set Blue Red Purple)))) 0)
Are the blue ship and the red ship parallel? parallel (= (orient Blue) (orient Red))
touching Do the blue ship and the red ship touch? touching (touch Blue Red)
Are any of the ships touching? touchingA (or (touch Blue Red) (or (touch Blue Purple) (touch Red Purple)))
Does the blue ship touch any other ship? touchingXA (or (touch Blue Red) (touch Blue Purple))
Does the blue ship touch both other ships? touchingX1 (and (touch Blue Red) (touch Blue Purple))
demonstration What is the location of one blue tile? demonstration (draw (select (set A1 ... F6) Blue))*
At what location is the top left part of the blue ship? topleft (topleft Blue)
At what location is the bottom right part of the blue ship? bottomright (bottomright Blue)
Questions as programs

Deﬁning an inﬁnite set of questions through
compositionality

Preliminary results for generating questions
Example for ideal observer maximizing expected information gain (EIG):

(+
( +
( * 100 (size Red) )
( * 10 (size Blue) )
)
(size Purple)
)

(+
( +
( * 100 (size Red) )
)
(size Purple)
)
Learning a generative model of questions:
Lake and Gureckis: Proposal for Huawei
as-programs framework developed in the previous
section. We will use a log-linear modeling frame-
work to learn a distribution over questions where
the probability of a question is a function of its fea-
tures, f1(·), . . . , fK(·). The features will include the
expected information gain (EIG) of a question in the
current context, f1(·), as well as features that encode
program length, answer type, and various grammat-
ical operators. We deﬁne the energy of a question x
to be
E(x) = ✓1f1(x) + ✓2f2(x) + · · · + ✓KfK(x), (1)
where ✓i is a weight assigned to feature fi. The prob-
ability of a question x is determined by its energy
P(x; ✓) =
exp E(x)
P , (2)
model can be evaluated b
uinely novel human-like
probability questions th
training set. Evaluating
tions – in terms of their
bility, and creativity – p
productivity of the mode
human ability to ask que
We will also explore
games. This will provid
stand how much of the f
to the Battleship task, an
other games or goal-direc
For example, we will ex
such as “Hangman.” In t
word marked by placeho
ditionally, the person pl
grams framework developed in the previous
n. We will use a log-linear modeling frame-
to learn a distribution over questions where
obability of a question is a function of its fea-
f1(·), . . . , fK(·). The features will include the
ted information gain (EIG) of a question in the
t context, f1(·), as well as features that encode
am length, answer type, and various grammat-
perators. We deﬁne the energy of a question x
x) = ✓1f1(x) + ✓2f2(x) + · · · + ✓KfK(x), (1)
✓i is a weight assigned to feature fi. The prob-
of a question x is determined by its energy
P(x; ✓) =
exp E(x)
P
x02X exp E(x0)
, (2)
high-energy questions have a lower probabil-
model can be evaluated by a
uinely novel human-like qu
probability questions that
training set. Evaluating th
tions – in terms of their un
bility, and creativity – prov
productivity of the model, w
human ability to ask questi
We will also explore ext
games. This will provide a
stand how much of the form
to the Battleship task, and h
other games or goal-directed
For example, we will explo
such as “Hangman.” In this
word marked by placeholde
ditionally, the person playin
the presence of a letter, o
there any ‘A’s?”, after whiGoal: predict human questions in novel scenarios
x
✓
f(·)
: question
: features (EIG, length, etc.)
: trainable parameters
energy:
generative model:

(+
( +
( * 100 (size Red) )
)
(size Purple)
)
(topleft
( map
(lambda
x
(topleft
(colortiles
x
)
)
)
(set Blue Red Purple)
)
)
Example question:Learning a generative model of questions:
Lake and Gureckis: Proposal for Huawei
as-programs framework developed in the previous
section. We will use a log-linear modeling frame-
work to learn a distribution over questions where
the probability of a question is a function of its fea-
tures, f1(·), . . . , fK(·). The features will include the
expected information gain (EIG) of a question in the
current context, f1(·), as well as features that encode
program length, answer type, and various grammat-
ical operators. We deﬁne the energy of a question x
to be
E(x) = ✓1f1(x) + ✓2f2(x) + · · · + ✓KfK(x), (1)
where ✓i is a weight assigned to feature fi. The prob-
ability of a question x is determined by its energy
P(x; ✓) =
exp E(x)
P , (2)
model can be evaluated b
uinely novel human-like
probability questions th
training set. Evaluating
tions – in terms of their
bility, and creativity – p
productivity of the mode
human ability to ask que
We will also explore
games. This will provid
stand how much of the f
to the Battleship task, an
other games or goal-direc
For example, we will ex
such as “Hangman.” In t
word marked by placeho
ditionally, the person pl
grams framework developed in the previous
n. We will use a log-linear modeling frame-
to learn a distribution over questions where
obability of a question is a function of its fea-
f1(·), . . . , fK(·). The features will include the
ted information gain (EIG) of a question in the
t context, f1(·), as well as features that encode
am length, answer type, and various grammat-
perators. We deﬁne the energy of a question x
x) = ✓1f1(x) + ✓2f2(x) + · · · + ✓KfK(x), (1)
✓i is a weight assigned to feature fi. The prob-
of a question x is determined by its energy
P(x; ✓) =
exp E(x)
P
x02X exp E(x0)
, (2)
high-energy questions have a lower probabil-
model can be evaluated by a
uinely novel human-like qu
probability questions that
training set. Evaluating th
tions – in terms of their un
bility, and creativity – prov
productivity of the model, w
human ability to ask questi
We will also explore ext
games. This will provide a
stand how much of the form
to the Battleship task, and h
other games or goal-directed
For example, we will explo
such as “Hangman.” In this
word marked by placeholde
ditionally, the person playin
the presence of a letter, o
there any ‘A’s?”, after whiGoal: predict human questions in novel scenarios
x
✓
f(·)
: question
: features (EIG, length, etc.)
: trainable parameters
energy:
generative model:

compositionality, causality, and learning-to-learn for
building more human-like learning algorithms
...
( >
( +
( map
( lambda
x
( =
( size x )
3
)
)
)
)
0
)
G GG
angle = 120
start = F
niter = 3
F F G + G + G F
Are the blue ship and
red ship parallel?
( =
( orient Blue )
( orient Red )
)
sij P(sij|si(j 1))
end for
Ri P(Ri|S1, ..., Si 1)
end for
{, R, S}
return @GENERATETOKEN( )
end procedure
procedure GENERATETOKEN( )
for i = 1... do
S
(m)
i P(S
(m)
i |Si)
L
(m)
i P(L
(m)
i |Ri, T
(m)
1 , ..., T
(m)
i 1 )
T
(m)
i f(L
(m)
i , S
(m)
i )
end for
A(m)
P(A(m)
)
I(m)
P(I(m)
|T(m)
, A(m)
)
return I(m)
end procedure
1

Future directions:
Causal, compositional, and embodied concepts
learning new gestures
learning new dance moves learning new spoken words
“Ban Ki-moon”
“Koﬁ Annan”
learning new handwritten characters

Developmental origins of one-shot learning
With Eliza Kosoy and Josh Tenenbaum
Which is another example? Draw another example
Child 2
Child 1
One-shot classiﬁcation One-shot generation
linked?

with Shannon Tubridy, Jason Fuller, & Todd Gureckis
Can we decode letter identity from pre-motor cortex, especially for
novel letters?
Neural mechanisms of one-shot learning
Overlapping representations for reading and writing letters in pre-
motor cortex.
(e.g., Longcamp et al., 2003)
N A B K
Stimuli:

Question asking in simple goal directed dialogs
with Anselm Rothe and Todd Gureckis
Concierge : What type of food are you thinking?
Guest : I feel like Italian food.
Concierge: How large is your party?
Guest : Four people.
Concierge : <insert question here>
Are you willing to travel
between 20 and 30 minutes
for four star place?
Should the average entree
price be more or less than
$20?
Do you prefer a four star Italian
restaurant or three star French?
?

Learning to play new video games
compositionality
causality
learning-to-learn
but also,
intuitive physics
intuitive psychology
How can people learn to play a new game so quickly?
What are the underlying cognitive principles?
(Lake et al., in press, Behavioral and Brain Sciences)

Conclusions
How can people learn such rich concepts from only one or
a few examples?
• Bayesian Program Learning answers this question for a range
of simple visual concepts.
• Embodies three principles — compositionality, causality, and
learning-to-learn — likely to be important for rapid learning of
rich concepts in many domains.
How can people synthesize novel questions when faced
with uncertainty?
• Questions can be represented as programs, and synthesized
utilizing compositionality and learning-to-learn

Thank you
Funding
Moore-Sloan Data Science Environment at NYU, NSF Graduate Research
Fellowship, the Center for Minds, Brains and Machines (CBMM) funded by
NSF STC award CCF-1231216, and ARO MURI contract
W911NF-08-1-0242
Josh Tenenbaum
Russ Salakhutdinov
Collaborators
Steve Piantadosi
Todd Gureckis Anselm Rothe

NYAI #9: Concepts and Questions As Programs by Brenden Lake

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to NYAI #9: Concepts and Questions As Programs by Brenden Lake

Similar to NYAI #9: Concepts and Questions As Programs by Brenden Lake (20)

Recently uploaded

Recently uploaded (20)

NYAI #9: Concepts and Questions As Programs by Brenden Lake