An introduction to compositional models in distributional semantics

www.insight-centre.org

An Introduction to Compositional
Models in Distributional Semantics
André Freitas
Supervisor: Edward Curry
Reading Group
Friday (22/11/2013)


Based on:
Baroni et al. (2012)
Frege in Space: A Program for Compositional Distributional
Semantics

The Paper

• Comprehensive (107 pages) introduction and
overview of compositional distributional
models.

3

Semantics for a Complex World

• Most semantic models have dealt with particular types of
constructions, and have been carried out under very
simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that
modern semantics can give a full account of all but the
simplest sentences.

Sahlgren,
2013

4

Goal behind Compositional Distributional Models

• Principled and effective semantic models for
coping with real world semantic conditions.
• Focus on semantic approximation.
• Applications
–
–
–
–
–

Semantic search.
Approximate semantic inference.
Paraphrase detection.
Semantic anomaly detection.
...
5

Paraphrase Detection

• I find it rather odd that people are already trying to tie
the Commission's hands in relation to the proposal for
a directive, while at the same calling on it to present a
Green Paper on the current situation with regard to
optional and supplementary health insurance schemes.

=?
• I find it a little strange to now obliging the Commission
to a motion for a resolution and to ask him at the same
time to draw up a Green Paper on the current state of
voluntary insurance and supplementary sickness
insurance.
6

Solving the Problem: The Data-driven Way

• Distributional
– Use vast corpora to extract the meaning of content
words.
– Provide a principled representation of distributional
meaning.

• Compositional
– These representations should be objects that compose
together to form more complex meanings.
– Content words should be able to combine with
grammatical roles, in ways that account for the
importance of structure in sentence meaning.

7


Distributional
Semantics
8

Distributional Semantics

• “Words occurring in similar (linguistic)
contexts are semantically similar.”
• Practical way to automatically harvest word
“meanings” on a large-scale.
• meaning = linguistic context.
• This can then be used as a surrogate of its
semantic representation.
9

Vector Space Model

function (number of times that the words occur in c1)

c1
0.7
0.5

husband
spouse

cn
child

c2

10

Semantic Similarity/Relatedness

c1

husband
spouse
θ

cn
child

c2

11

Similarity

• Distributional vectors allow a precise
quantification of similarity.
• Measured by the distance of the
corresponding vectors on the Cartesian plane.

12

Semantic Approximation (Video)


Compositional
Model

Compositional Semantics

• Can we extend DS to account for the
meaning of phrases and sentences?

15

Compositionality

• The meaning of a complex expression is a
function of the meaning of its constituent
parts.
digest
slowly

carnivorous
plants

16

Compositionality Principles

Words that act as functions
transforming the distributional
profile of other words (e.g.,
verbs, adjectives, …).
Words in which the
meaning
is
directly
determined
by
their
distributional
behaviour
(e.g., nouns).

17

Compositionality Principles

• Take the syntactic structure to constitute the backbone
guiding the assembly of the semantic representations
of phrases.
• A correspondence between syntactic categories and
distributional objects.

18

Mixture-based Models

• Mitchell and Lapata (2010)
• Proposed two broad classes of composition
models.
– Additive.
– Multiplicative.

19

Additive Model

20

Additive Model

• Limitations with the additive model:
– The input vectors contribute to the composed
expression in the same way.
– Linguistic intuition would suggest that the
composition operation is asymmetric (head of the
phrase should have greater weight).

21

Multiplicative Model

22

Analysis

• Multiplicative models perform quite well in
the task of predicting human similarity
judgments about adjective-noun, noun-noun,
verb-noun and noun-verb phrases.

23

Criticism of Mixture Models

• Some words have an intrinsic functional
behaviour:
“lice on dogs”, “lice and dogs”
• Lack of recursion.
• To address these limitations function-based
models were introduced.
24

Mixture vs Function

25

Distributional Functions

• Composition as function application.
• Nouns are still represented as vectors.
• Adjectives, verbs, determiners, prepositions, c
onjunctions and so forth are all modelled by
distributional functions.
(ON(dogs))(lice)
AND(lice, dogs)

26

Distributional functions as linear
transformations
• Distributional functions are linear transformations on
semantic vector/tensor spaces.
• Matrix: First-order, one argument distributional functions.
• Used to represent adjectives and intransitive verbs.

27

Example: Adjective + Noun

• Adjective = a function from nouns to nouns,

28

Measuring similarity of tensors

• Two matrices (or tensors) are similar when
they have a similar weight distribution, i.e.,
they
perform
similar
input-to-output
component mappings.
• DECREPIT, OLD might dampen the “runs”
component of a noun.

29

Inducing distributional functions
from corpus data
- Distributional functions are
induced from input to output
transformation examples
Regression
techniques
commonly used in machine
learning.

30

Socher, 2012

• Recursive neural network (RNN) model that learns
compositional vector representations for phrases and
sentences.
• State of the art performance on three different experiments
sentiment analysis and cause-effect semantic relations.

32

Main Challenges

• Challenge I: Lack of sufficient examples of their inputs and
outputs.
– Possible Solution: Extend the training sets exploiting
similarities between linguistic expressions to ‘share’ training
examples across distributional functions.
• Challenge II: Computational power and space
– Grefenstette et al., 2013.
– Nouns live in 300-dimensional spaces, a transitive verb is a
(300 × 300) × 300 tensor, that is, it contains 27 million
components.
– Relative pronoun: (300 × 300) × (300 × 300) tensor, contains
8.1 billion components.
33

Categorial Grammar

•
•
•
•

Provides the syntax-semantics interface.
Tight connection between syntax and semantics.
Motivated by the principle of compositionality.
View that syntactic constituents should generally
combine as functions or according to a functionargument relationship.

34

Categorial Grammar

Apply
Inference
rules

The string is
a sentence

((the (bad boy)) (made (that mess)))
35

Local compositions

BARK x dogs

vector

matrix

36

Local compositions

(CHASE × cats) × dogs.

(CHASE × cats)

vector
3rd order tensor

37

vector

Syntax-Semantics interface
for a English fragment

38


Other Compositional Models

• Coecke et al. (2010): Category theory and
Lambek calculus.
• Grefenstette et al. (2013): Simulating Logical
Calculi with Tensors.

• Novacek et al. ISWC (2011), Freitas et al. ICSC
(2011) : Semantic Web & Distributional
Semantics.

39

Conclusion

• Distributional semantics brings a promising
approach for building computational models
that work in the real world.
• Semantic approximation as a built-in
construct.
• Compositionality is still an open problem but
classical (formal) works have been leveraged
and adapted to DSMs.
• Exciting time to be around!
40

An introduction to compositional models in distributional semantics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to An introduction to compositional models in distributional semantics

Similar to An introduction to compositional models in distributional semantics (20)

More from Andre Freitas

More from Andre Freitas (20)

Recently uploaded

Recently uploaded (20)

An introduction to compositional models in distributional semantics

Editor's Notes