Introduction to Distributional Semantics

Introduction to Distributional
Semantics
André Freitas
Insight Centre for Data Analytics
Insight Workshop on Distributional Semantics
Galway, 2014
Based on the Great ESSLLI Tutorial from Evert & Lenci

Outline
 Contemporary Semantics
 Distributional Semantics
 Compositional-Distributional Semantics
 Take-away message

Shift in the Semantics Landscape
Corroboration
PraxisScientific / FormalPhilosophical
Semantics as a
complex phenomena

Semantics for a Complex World
• Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.
Sahlgren, 2013
Formal World Real World
Baroni et al., 2012

What is Distributional
Semantics?

Meaning
 Word meaning is usually represented in terms of some formal,
symbolic structure, either external or internal to the word
 External structure
- Associations between different concepts
 Internal structure
- Feature (property, attribute) lists
 The semantic properties of a word are derived from the formal
structure of its representation
- e.g. Inference algorithm, etc.
Semantics = Meaning representation model (data) +
inference model

Formal Representation of Meaning
 Modelling fine-grained lexical inferences

Formal Representation of Meaning
(Problems)
 Different meanings
- bat (animal), bat (artefact)
 Meaning variation in context
- clever politician, clever tycoon
 Meaning evolution
 Ambiguity, vagueness, inconsistency
Word meaning acquisition
Lack of flexibility
Scalability

Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend
to be semantically similar”
 He filled the wampimuk with the substance, passed it
around and we all drunk some
 We found a little, hairy wampimuk sleeping behind the
tree

Weak and Strong DH (Lenci, 2008)
 Weak DH:
- Word meaning is reflected in linguistic distributions
- By inspecting a sufficiently large number of distributional
contexts we may have a useful surrogate representation of
meaning.
 Strong DH:
- A cognitive hypothesis about the form and origin of semantic
representations

Contextual Representation
 Abstract structure that accumulates encounters with the words
in various (linguistic) contexts.
 For our purposes …
- Context is equated with linguistic context

Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”

contexts = nouns and verbs in the same
sentence

bark
dog
park
leash
contexts = nouns and verbs in the same
sentence
bark : 2
park : 1
leash : 1
owner : 1

distributional matrix = targets x contexts
contexts
targets
Vector Space Model (VSM)

Semantic Similarity & Relatedness
θ
car
dog
cat
bark
run
leash

Semantic Similarity & Relatedness
 Semantic similarity - two words sharing a high number of
salient
- features (attributes)
- synonymy (car/automobile)
- hyperonymy (car/vehicle)
- co-hyponymy (car/van/truck)
 Semantic relatedness (Budanitsky & Hirst 2006) - two words
semantically associated without being necessarily similar
- function (car/drive)
- meronymy (car/tyre)
- location (car/road)
- attribute (car/fast)

 Computational models that build contextual semantic representations
from corpus data
 Semantic context is represented by a vector
 Vectors are obtained through the statistical analysis of the linguistic
contexts of a word
 Salience of contexts (cf. context weighting scheme)
 Semantic similarity/relatedness as the core operation over the model

DSMs as Commonsense Reasoning
Commonsense is here
θ
car
dog
cat
bark
run
leash

DSMs as Commonsense Reasoning
θ
car
dog
cat
bark
run
leash
...
vs.
Semantic best-effort

Demonstration (EasyESA)
http://treo.deri.ie/easyesa/

Applications
 Applications
- Semantic search
- Question answering
- Approximate semantic inference
- Word sense disambiguation
- Paraphrase detection
- Text entailment
- Semantic anomaly detection
...

Alternative Names for DSMs
 Corpus-based semantics
 Statistical semantics
 Geometrical models of meaning
 Vector semantics
 Word (semantic) space models

Building a DSM
 Pre-process a corpus (target, context)
 Count the target-context co-occurrences
 Weight the contexts (optional)
 Build the distributional matrix
 Reduce the matrix dimensions (optional)
 Parameters
- Corpus
- Context type
- Weighting scheme
- Similarity measure
- Number of dimensions
 A parameter configuration determines the DSM: (LSA, ESA, …)

Parameters
 Corpus pre-processing
- Stemming/lemmatization
- POS tagging
- Syntactic Dependencies
 Context
- Document
- Paragraph
- Passage
- Word windows
- Words
- Linguistic features
- Lingustic patterns
- Verbs : contexts nouns
- Verbs : contexts adverbs
- etc.
- Size
- Shape
Context
Engineering

Context Weighting
 Smoothing frequency differences: From raw counts to log-
frequency.
 Association measures (Evert 2005): are used to give more
weight to contexts that are more significantly associated with a
target word

Context Weighting
Measures
Kiela & Clark, 2014

Similarity Measures
Kiela & Clark, 2014

What is the best parameter configuration?
 The best parameter configuration depends on the task.
 Systematic exploration of the parameters

DSM Instances
 Latent Semantic Analysis (Landauer & Dumais 1996)
 Hyperspace Analogue to Language (Lund & Burgess 1996)
 Infomap NLP (Widdows 2004)
 Random Indexing (Karlgren & Salhgren 2001)
 Dependency Vectors (Pad´o & Lapata 2007)
 Explicit Semanitc Analysis (Gabrilovich & Markovitch, 2008)
 Distributional Memory (Baroni & Lenci 2009)

Paraphrase Detection
I find it rather odd that people are already trying to tie the
Commission's hands in relation to the proposal for a
directive, while at the same calling on it to present a Green
Paper on the current situation with regard to optional and
supplementary health insurance schemes.
I find it a little strange to now obliging the Commission to
a motion for a resolution and to ask him at the same time
to draw up a Green Paper on the current state of voluntary
insurance and supplementary sickness insurance.
=?

Compositional Semantics
 Can we extend DS to account for the meaning of phrases
and sentences?
 Compositionality: The meaning of a complex expression
is a function of the meaning of its constituent parts.

Words in which the meaning is
directly determined by their
distributional behaviour (e.g.,
nouns).
Words that act as functions
transforming the distributional
profile of other words (e.g., verbs,
adjectives, …).

Mixture Function

 Take the syntactic structure to constitute the backbone
guiding the assembly of the semantic representations of
phrases.
(CHASE × cats) × dogs.
3rd order tensor vector
vector
(CHASE × cats)
Baroni et al., 2012

Formal Model
 Distributional Semantics & Category Theory

Take-away message
 Low acquisition effort
 Simple way to build a commonsense KB
 Semantic approximation as a built-in construct
 Semantic best-effort
 Simple to use
 DSMs are evolving fast (compositional and formal grounding)
 Distributional semantics brings a promising approach for
building semantic models that work in the real world

Great Introductory References
 Evert & Lenci ESSLLI Tutorial on Distributional
Semantics, 2009. (many slides were taken or adapted
from this great tutorial).
 Turney & Pantel, From Frequency to Meaning:Vector
Space Models of Semantics, 2010.
 Baroni et al., Frege in Space: A Program for
Compositional Distributional Semantics, 2012.
 Kiela & Clark: A Systematic Study of Semantic Vector
Space Model Parameters, 2014.

Introduction to Distributional Semantics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Introduction to Distributional Semantics

Similar to Introduction to Distributional Semantics (20)

More from Andre Freitas

More from Andre Freitas (20)

Recently uploaded

Recently uploaded (20)

Introduction to Distributional Semantics