Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Fcv poster parikh
1. Rela-ve
A0ributes
Devi
Parikh
(TTIC)
and
Kristen
Grauman
(UT
Aus0n)
1.
Main
Idea
4.
Rela-ve
Zero-‐shot
Learning
6.
Datasets
8.
Zero-‐shot
Learning
Results
Mo-va-on:
Proposed
idea:
Rela-ve
A0ributes
Learnt
rela-ve
a0ributes
Outdoor
Scene
Recogni-on
(OSR):
2688
images,
8
categories:
coast
(C),
forest
(F),
highway
(H),
inside-‐city
(I),
mountain
(M),
open-‐country
(O),
street
(S)
and
Baselines:
tall-‐building
(T),
gist
features;
Direct
AMribute
Predic0on
(DAP)
Categorical
(binary)
aMributes
are
Richer
communica0on
between
Young:
…
Smiling:
∼
Public
Figure
Face
(PubFig):
800
images,
8
categories:
Alex
Rodriguez
(A),
Clive
[Lampert
et
al.
2009]
(binary)
restric0ve
and
can
be
unnatural
humans
and
machines
Owen
(C),
Hugh
Laurie
(H),
Jared
Leto
(J),
Miley
Cyrus
(M),
ScarleM
Johansson
(S),
Viggo
Mortensen
(V)
and
Zac
Efron
(Z),
gist
and
color
features
Describe
images
or
categories
rela0vely
Binary Relative
c(x) = argmax
ˆ P (am |x)
c
e.g.
“dogs
are
furrier
than
giraffes”,
Training:
Images
from
S
seen
and
descrip0ons
of
U
OSR TI S HC OMF m
“find
less
congested
downtown
Chicago
unseen
categories
natural
open
00001 11 1
00011 11 0
T≺I∼S≺H≺C∼O∼M∼F
T∼F≺I∼S≺M≺H∼C∼O
Classifier
instead
of
ranker
(SRA)
scene
than
”
Tes0ng:
Categorize
image
into
N
(=S+U)
categories
perspective 11110 00 0 O≺C≺M∼F≺H≺I≺S≺T
Number
of
unseen
categories
OSR PubFig
large-objects 11100 00 0 F≺O∼M≺I∼S≺H∼C≺T 80
OSR
PubFig
Learn
a
ranking
func0on
for
each
Unseen
categories
Rela-ve
a0ributes
space
diagonal-plane
close-depth
11110 00 0
11110 00 1
F≺O∼M≺C≺I∼S≺H≺T
C≺M≺O≺T∼I∼S∼H∼F 60
60
Accuracy
Accuracy
Natural
?
Not
Natural
aMribute
PubFig ACHJ MS V Z 40 40
Young:
S
C
H
M
Z
S
Masculine-looking 11110 01 1 S≺M≺Z≺V≺J≺A≺H≺C
Enables
new
applica-ons
White 01111 11 1 A≺C≺H≺Z≺J≺S≺M≺V 20 20
Smiling
Novel
zero-‐shot
learning
from
aMribute
Smiling:
M
Z
M
Young
Smiling
00001 10 1
11101 10 1
V≺H≺C≺J≺A≺S≺Z≺M
J≺V≺H≺A∼C≺S∼Z≺M
0
DAP
0 1 2 3 4 5
SRA
0
Proposed
0 1 2 3 4 5
comparisons
Need
not
use
all
aMributes
C
Chubby 10000 00 0 V≺J≺H≺C≺Z≺M≺S≺A # unseen categories # unseen categories
H
Z
Visible-forehead 11101 11 0 J≺Z≺M≺S≺A∼C∼H∼V classical
recogni0on
problem
binary
~
rela0ve
supervision
Precise
automa0cally
generated
textual
Bushy-eyebrows 01010 00 0 M≺S≺Z≺V≺H≺A≺C≺J
Smiling
?
Not
Smiling
Need
not
relate
to
all
S
Narrow-eyes 01100 01 1 M≺J≺S≺A≺H≺C≺V≺Z Amt.
of
labeled
data
to
learn
a0ributes
OSR PubFig
descrip0ons
of
images
Youth
Pointy-nose
Big-lips
00100 00 1
10001 10 0
A≺C≺J∼M∼V≺S≺Z≺H
H≺J≺V≺Z≺C≺M≺A≺S 60 60
Infer
image
category
using
max-‐likelihood
Accuracy
Accuracy
2.
Learning
Rela-ve
A0ributes
Round-face 10001 10 0 H≺V≺J≺C≺Z≺A≺S≺M 40
40
( ), }, S :{ 5.
Describing
Images
Rela-vely
7.
Image
Descrip-on
Results
20 20
For
each
aMribute
am , Supervision
is
Om : { ...
{ m ∼ }} ,
...
…
Human
subject
experiment:
Which
image
is?
0
1 2
DAP
5 15
SRA
0
1
Proposed
2 5 15
Learnt
rela-ve
a0ributes
Density:
# labeled pairs # labeled pairs
Learn
a
scoring
func0on
rm (xi ) = T
that
best
sa0sfies
constraints:
% correct image in top choices
More
chubby
than
More
smiling
than
More
VisFHead
than
w m xi 100
Binary
baseline
supervision
can
give
unique
ordering
on
all
classes
Relative
Amount
of
descrip-on
80 OSR PubFig
∀(i, j) ∈ Om : T
w m xi T
w m xj ∀(i, j) ∈ Sm : T
w m xi = T
wm xj Auto
-‐
generate
textual
descrip-on
of:
Less
chubby
than
Less
smiling
than
Less
VisFHead
than
60 60 60
Max-‐margin
learning
to
rank
formula-on
Accuracy
Accuracy
40
min 1 T 2 2 2 Rela-ve
a0ributes
space
1/8
dataset
40 40
1 T 2 2 2 ||wm ||2 + C
Adapted
objec0ve
ξij + γij 20
min ||wm ||2 + C ξij + γij 2 20 20
2
2 2 from
[Joachims,
2002]
T Density
T ?
?
?
0
1 2 3
DAP SRA Proposed
C ξij + γij T s.t wm (xi − xj ) ≥ 1 − ξij , ∀(i, j) ∈ Om ; |wm (xi − xj )| ≤ γij , ∀(i, j) ∈ Sm ;
T Example
descrip-ons
# top choices 0
6 5 4 3 2 1
0
11109 8 7 6 5 4 3 2 1
s.t wm (xi − xj ) ≥ 1 − ξij , ∀(i, j) ∈ Om ; |wm (xi − xj )| ≤ γij , ∀(i, j) ∈ Sm ;C
C
H
H
H
C
F
H
H
M
F
F
I
F
Image
Binary
descrip0ons
Rela0ve
descrip0ons
# att to describe unseen # att to describe unseen
T ξij ≥ 0; γij ≥ 0 An
aMribute
is
more
discrimina0ve
when
used
rela0vely
≥ 1 − ξij , ∀(i, j) ∈ Omij |wm (xi − xj )| ≤ γij , ∀(i, j) ∈ Sm ;
ξ ; ≥ 0; γij ≥ 0 Rela-ve
descrip-on:
not
natural,
not
open,
more
natural
than
tallbuilding;
less
natural
than
forest;
more
open
than
perspec0ve
tallbuilding;
less
open
than
coast;
more
perspec0ve
than
tallbuilding;
OSR PubFig
not
natural,
not
open,
more
natural
than
insidecity;
less
natural
than
highway;
more
open
than
Quality
of
descrip-on
street;
less
open
than
coast;
more
perspec0ve
than
highway;
less
3.
Ranking
Func-on
vs.
Binary
Classifier
Score
perspec0ve
60 60
perspec0ve
than
insidecity
“more
dense
than
,
less
dense
than
”
Accuracy
Accuracy
natural,
open,
more
natural
than
tallbuilding;
less
natural
than
mountain;
more
open
perspec0ve
than
mountain;
less
perspec0ve
than
opencountry;
40 40
wb
How
do
learned
wm
“more
dense
than
Highways,
less
dense
than
Forests”
White,
not
Smiling,
VisibleForehead
more
White
than
AlexRodriguez;
more
Smiling
than
JaredLeto;
less
Smiling
than
ZacEfron;
more
VisibleForehead
than
JaredLeto;
less
20 20
VisibleForehead
than
MileyCyrus
ranking
func0ons
%
correctly
ordered
pairs
Classifier
Ranker
Not
dense:
Dense:
White,
not
Smiling,
more
White
than
AlexRodriguez;
less
White
than
MileyCyrus;
less
Smiling
DAP SRA Proposed
Outdoor
scenes
80%
89%
than
HughLaurie;
more
VisibleForehead
than
ZacEfron;
less
0 0
differ
from
classifier
Whereas
conven0onal
not
VisibleForehead
VisibleForehead
than
MileyCyrus
1 2 3 1 2 3
Celebrity
faces
67%
82%
not
Young,
more
Young
than
CliveOwen;
less
Young
than
ScarleMJohansson;
more
Looseness of constraints Looseness of constraints
outputs?
Binary
descrip-on:
“not
dense”
BushyEyebrows,
RoundFace
BushyEyebrows
than
ZacEfron;
less
BushyEyebrows
than
AlexRodriguez;
more
RoundFace
than
CliveOwen;
less
RoundFace
than
ZacEfron
Rela0ve
aMributes
jointly
carve
out
space
for
unseen
category