An Introduction to Compositional
Models in Distributional Semantics
Supervisor: Edward Curry
Baroni et al. (2012)
Frege in Space: A Program for Compositional Distributional
• Comprehensive (107 pages) introduction and
overview of compositional distributional
Semantics for a Complex World
• Most semantic models have dealt with particular types of
constructions, and have been carried out under very
simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that
modern semantics can give a full account of all but the
Goal behind Compositional Distributional Models
• Principled and effective semantic models for
coping with real world semantic conditions.
• Focus on semantic approximation.
Approximate semantic inference.
Semantic anomaly detection.
• I find it rather odd that people are already trying to tie
the Commission's hands in relation to the proposal for
a directive, while at the same calling on it to present a
Green Paper on the current situation with regard to
optional and supplementary health insurance schemes.
• I find it a little strange to now obliging the Commission
to a motion for a resolution and to ask him at the same
time to draw up a Green Paper on the current state of
voluntary insurance and supplementary sickness
Solving the Problem: The Data-driven Way
– Use vast corpora to extract the meaning of content
– Provide a principled representation of distributional
– These representations should be objects that compose
together to form more complex meanings.
– Content words should be able to combine with
grammatical roles, in ways that account for the
importance of structure in sentence meaning.
• “Words occurring in similar (linguistic)
contexts are semantically similar.”
• Practical way to automatically harvest word
“meanings” on a large-scale.
• meaning = linguistic context.
• This can then be used as a surrogate of its
Vector Space Model
function (number of times that the words occur in c1)
• Can we extend DS to account for the
meaning of phrases and sentences?
• The meaning of a complex expression is a
function of the meaning of its constituent
Words that act as functions
transforming the distributional
profile of other words (e.g.,
verbs, adjectives, …).
Words in which the
• Take the syntactic structure to constitute the backbone
guiding the assembly of the semantic representations
• A correspondence between syntactic categories and
• Mitchell and Lapata (2010)
• Proposed two broad classes of composition
• Limitations with the additive model:
– The input vectors contribute to the composed
expression in the same way.
– Linguistic intuition would suggest that the
composition operation is asymmetric (head of the
phrase should have greater weight).
• Multiplicative models perform quite well in
the task of predicting human similarity
judgments about adjective-noun, noun-noun,
verb-noun and noun-verb phrases.
Criticism of Mixture Models
• Some words have an intrinsic functional
“lice on dogs”, “lice and dogs”
• Lack of recursion.
• To address these limitations function-based
models were introduced.
• Composition as function application.
• Nouns are still represented as vectors.
• Adjectives, verbs, determiners, prepositions, c
onjunctions and so forth are all modelled by
Distributional functions as linear
• Distributional functions are linear transformations on
semantic vector/tensor spaces.
• Matrix: First-order, one argument distributional functions.
• Used to represent adjectives and intransitive verbs.
Example: Adjective + Noun
• Adjective = a function from nouns to nouns,
Measuring similarity of tensors
• Two matrices (or tensors) are similar when
they have a similar weight distribution, i.e.,
• DECREPIT, OLD might dampen the “runs”
component of a noun.
Inducing distributional functions
from corpus data
- Distributional functions are
induced from input to output
commonly used in machine
• Recursive neural network (RNN) model that learns
compositional vector representations for phrases and
• State of the art performance on three different experiments
sentiment analysis and cause-effect semantic relations.
• Challenge I: Lack of sufficient examples of their inputs and
– Possible Solution: Extend the training sets exploiting
similarities between linguistic expressions to ‘share’ training
examples across distributional functions.
• Challenge II: Computational power and space
– Grefenstette et al., 2013.
– Nouns live in 300-dimensional spaces, a transitive verb is a
(300 × 300) × 300 tensor, that is, it contains 27 million
– Relative pronoun: (300 × 300) × (300 × 300) tensor, contains
8.1 billion components.
Provides the syntax-semantics interface.
Tight connection between syntax and semantics.
Motivated by the principle of compositionality.
View that syntactic constituents should generally
combine as functions or according to a functionargument relationship.
The string is
((the (bad boy)) (made (that mess)))
BARK x dogs
(CHASE × cats) × dogs.
(CHASE × cats)
3rd order tensor
for a English fragment
Other Compositional Models
• Coecke et al. (2010): Category theory and
• Grefenstette et al. (2013): Simulating Logical
Calculi with Tensors.
• Novacek et al. ISWC (2011), Freitas et al. ICSC
(2011) : Semantic Web & Distributional
• Distributional semantics brings a promising
approach for building computational models
that work in the real world.
• Semantic approximation as a built-in
• Compositionality is still an open problem but
classical (formal) works have been leveraged
and adapted to DSMs.
• Exciting time to be around!