Encoding syntactic dependencies by vector permutation

Encoding
syntac-c
dependencies

by
vector
permuta-on

Pierpaolo
Basile,
Annalina
Caputo
and
Giovanni
Semeraro

Department
of
Computer
Science

University
of
Bari
“Aldo
Moro”
(Italy)

GEMS
2011:
GEometrical
Models
of
Natural
Language
Seman-cs

Edinburgh,
Scotland
-‐
July
31st,
2011

Mo-va-on

•  meaning
is
its
use

•  the
meaning
of
a
word

is
determined
by
the
set

of
textual
contexts
in

which
it
appears

•  one
deﬁni-on
of

context
at
a
-me

2

Building
Blocks

•  Random
Indexing

•  Dependency
Parser

•  Vector
Permuta-on

3

Random
Indexing

•  assign
a
context
vector
to
each
context

element
(e.g.
document,
passage,
term,
…)

•  term
vector
is
the
sum
of
the
context
vectors

in
which
the
term
occurs

–  some-mes
the
context
vector
could
be
boosted

by
a
score
(e.g.
term
frequency,
PMI,
…)

4

Context
Vector

0
0
0
0
0
0
0
-‐1
0
0
0
0
1
0
0
-‐1
0
1
0
0
0
0
1
0
0
0
0
-‐1

•  sparse

•  high
dimensional

•  ternary
{-‐1,
0,
+1}

•  small
number
of
randomly
distributed
non-‐
zero
elements

5

Random
Indexing
(formal)

n,k n,m m,k
B =A R k << m
B
preserves
the
distance

between
points

(Johnson-‐Lindenstrauss
lemma)

dr = c ! d

6

Dependency
parser

John
eats
a
red
apple.

subject
object

John
eats
apple

modiﬁer

red

7

Vector
permuta-on

•  using
permuta-on
of
elements
in
random

vector
to
encode
several
contexts

–  right
shib
of
n
elements
to
encode
dependents

(permuta-on)

–  leb
shib
of
n
elements
to
encode
heads
(inverse

permuta-on)

•  choose
a
diﬀerent
n
for
each
kind
of

dependency

8

Method

•  assign
a
context
vector
to
each
term

•  assign
a
shib
func-on
(Πn)
to
each
kind
of

dependency

•  each
term
is
represented
by
a
vector
which
is

–  the
sum
of
the
permuted
vectors
of
all
the

dependent
terms

–  the
sum
of
the
inverse
permuted
vectors
of
all
the

head
terms

9

Example

John
-‐>
(0,
0,
0,
0,
0,
0,
1,
0,
-‐1,
0)

eat
-‐>
(1,
0,
0,
0,
-‐1,
0,
0
,0
,0
,0)

John
eats
a
red
apple
red-‐>
(0,
0,
0,
1,
0,
0,
0,
-‐1,
0,
0)

apple
-‐>
(1,
0,
0,
0,
0,
0,
0,
-‐1,
0,
0)

mod-‐>Π3;
obj-‐>Π7

(apple)=Π3(red)+Π-‐7(eat)=…

10

Example

John
-‐>
(0,
0,
0,
0,
0,
0,
1,
0,
-‐1,
0)

eat
-‐>
(1,
0,
0,
0,
-‐1,
0,
0
,0
,0
,0)

John
eats
a
red
apple
red-‐>
(0,
0,
0,
1,
0,
0,
0,
-‐1,
0,
0)

apple
-‐>
(1,
0,
0,
0,
0,
0,
0,
-‐1,
0,
0)

mod-‐>Π3;
obj-‐>Π7

(apple)=Π3(red)+Π-‐7(eat)=…

…=(-‐1,
0,
0,
0,
0,
0,
1,
0,
0,
0)
+
(0,
0,
0,
1,
0,
0,
0,
-‐1,
0,
0)

3
right
shibs
7
leb
shibs

11

Output

R
B

Vector
space
of
random
Vector
space
of
terms

context
vectors

12

Query
1/4

•  similarity
between
terms

–  cosine
similarity
between
terms
vectors
in
B

–  terms
are
similar
if
they
occur
in
similar
syntac-c

contexts

13

Query
2/4

Words
similar
to
“provide”

oﬀer
0.855

supply
0.819

deliver
0.801

give
0.787

contain
0.784

require
0.782

present
0.778

14

Query
3/4

•  similarity
between
terms
exploi-ng

dependencies

what
are
the
objects
of
the
word
“provide”?

1.  get
the
term
vector
for
“provide”
in
B

2.  compute
the
similarity
with
all
permutated

vectors
in
R
using
the
permuta-on
assigned
to

“obj”
rela-on

15

Query
4/4

What
are
the
objects
of
the
word
“provide”?

informa-on
0.344

food
0.208

support
0.143

energy
0.143

job
0.142

16

Composi-onal
seman-cs
1/2

•  words
are
represented
in
isola-on

•  represent
complex
structure
(phrase
or

sentence)
is
a
challenge
task

–  IR,
QA,
IE,
Text
Entailment,
…

•  how
to
combine
words

–  tensor
product
of
words

–  Clark
and
Pulman
suggest
to
take
into
account

symbolic
features
(syntac-c
dependencies)

17

Composi-onal
seman-cs
2/2

man
reads
magazine

(Clark
and
Pulman)

man ! subj ! read ! obj ! magazine

18

Similarity
between
structures

man
reads
magazine

woman
browses
newspaper

man ! subj ! read ! obj ! magazine
woman ! subj ! browse ! obj ! newspaper

19

…a
bit
of
math

(w1 ! w2 )" (w3 ! w4 ) = (w1 " w3 ) # (w2 " w4 )

man ! woman " read ! browse " magazine ! newspaper

20

System
setup

•  Implemented
in
JAVA

•  Two
corpora

–  TASA:
800K
sentences

and
9M
dependencies

–  a
por-on
of
ukWaC:
7M

sentences
and
127M

dependencies

–  40,000
most
frequent

words

•  Dependency
parser

–  MINIPAR

21

Evalua-on

•  GEMS
2011
Shared
Task
for
composi-onal

seman-cs

–  list
of
two
pairs
of
words
combina-on

•  rated
by
humans

•  5,833
rates

•  encoded
dependencies:
subj,
obj,
mod,
nn

–  GOAL:
compare
the
system
performance
against

humans
scores

•  Spearman
correla-on

22

Results
(old)

Corpus
Combina-on
ρ

TASA
verb-‐obj
0.260

adj-‐noun
0.637

compound
nouns
0.341

overall
0.275

ukWaC
verb-‐obj
0.292

adj-‐noun
0.445

compound
nouns
0.227

overall
0.261

23

Results
(new)

Corpus
Combina-on
ρ

TASA
verb-‐obj
0.160

adj-‐noun
0.435

compound
nouns
0.243

overall
0.186

ukWaC
verb-‐obj
0.190

adj-‐noun
0.303

compound
nouns
0.159

overall
0.179

24

Conclusion
and
Future
Work

•  Conclusion

–  encode
syntac-c
dependencies
using
vector

permuta-ons
and
Random
Indexing

–  early
arempt
in
seman-c
composi-on

•  Future
Work

–  deeper
evalua-on
(in
vivo)

–  more
formal
study
about
seman-c
composi-on

–  tackle
scalability
problem

–  try
to
encode
other
kinds
of
context

25

That’s
all
folks!

26

Encoding syntactic dependencies by vector permutation

Recommended

Recommended

More Related Content

More from Pierpaolo Basile

More from Pierpaolo Basile (19)

Recently uploaded

Recently uploaded (20)

Encoding syntactic dependencies by vector permutation