Your SlideShare is downloading. ×
Introduction to Distributional Semantics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Introduction to Distributional Semantics

323
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
323
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to Distributional Semantics André Freitas Insight Centre for Data Analytics Insight Workshop on Distributional Semantics Galway, 2014 Based on the Great ESSLLI Tutorial from Evert & Lenci
  • 2. Outline  Contemporary Semantics  Distributional Semantics  Compositional-Distributional Semantics  Take-away message
  • 3. Contemporary Semantics
  • 4. Shift in the Semantics Landscape Corroboration PraxisScientific / FormalPhilosophical Semantics as a complex phenomena
  • 5. Semantics for a Complex World • Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions. • If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements. Sahlgren, 2013 Formal World Real World Baroni et al., 2012
  • 6. What is Distributional Semantics?
  • 7. Meaning  Word meaning is usually represented in terms of some formal, symbolic structure, either external or internal to the word  External structure - Associations between different concepts  Internal structure - Feature (property, attribute) lists  The semantic properties of a word are derived from the formal structure of its representation - e.g. Inference algorithm, etc. Semantics = Meaning representation model (data) + inference model
  • 8. Formal Representation of Meaning  Modelling fine-grained lexical inferences
  • 9. Formal Representation of Meaning (Problems)  Different meanings - bat (animal), bat (artefact)  Meaning variation in context - clever politician, clever tycoon  Meaning evolution  Ambiguity, vagueness, inconsistency Word meaning acquisition Lack of flexibility Scalability
  • 10. Distributional Hypothesis “Words occurring in similar (linguistic) contexts tend to be semantically similar”  He filled the wampimuk with the substance, passed it around and we all drunk some  We found a little, hairy wampimuk sleeping behind the tree
  • 11. Weak and Strong DH (Lenci, 2008)  Weak DH: - Word meaning is reflected in linguistic distributions - By inspecting a sufficiently large number of distributional contexts we may have a useful surrogate representation of meaning.  Strong DH: - A cognitive hypothesis about the form and origin of semantic representations
  • 12. Contextual Representation  Abstract structure that accumulates encounters with the words in various (linguistic) contexts.  For our purposes … - Context is equated with linguistic context
  • 13. Distributional Semantic Models (DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.”
  • 14. Distributional Semantic Models (DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.” contexts = nouns and verbs in the same sentence
  • 15. Distributional Semantic Models (DSMs) “The dog barked in the park. The owner of the dog put him on the leash since he barked.” bark dog park leash contexts = nouns and verbs in the same sentence bark : 2 park : 1 leash : 1 owner : 1
  • 16. Distributional Semantic Models (DSMs) distributional matrix = targets x contexts contexts targets Vector Space Model (VSM)
  • 17. Semantic Similarity & Relatedness θ car dog cat bark run leash
  • 18. Semantic Similarity & Relatedness  Semantic similarity - two words sharing a high number of salient - features (attributes) - synonymy (car/automobile) - hyperonymy (car/vehicle) - co-hyponymy (car/van/truck)  Semantic relatedness (Budanitsky & Hirst 2006) - two words semantically associated without being necessarily similar - function (car/drive) - meronymy (car/tyre) - location (car/road) - attribute (car/fast)
  • 19. Distributional Semantic Models (DSMs)  Computational models that build contextual semantic representations from corpus data  Semantic context is represented by a vector  Vectors are obtained through the statistical analysis of the linguistic contexts of a word  Salience of contexts (cf. context weighting scheme)  Semantic similarity/relatedness as the core operation over the model
  • 20. DSMs as Commonsense Reasoning Commonsense is here θ car dog cat bark run leash
  • 21. DSMs as Commonsense Reasoning
  • 22. DSMs as Commonsense Reasoning θ car dog cat bark run leash ... vs. Semantic best-effort
  • 23. Demonstration (EasyESA) http://treo.deri.ie/easyesa/
  • 24. Applications  Applications - Semantic search - Question answering - Approximate semantic inference - Word sense disambiguation - Paraphrase detection - Text entailment - Semantic anomaly detection ...
  • 25. Alternative Names for DSMs  Corpus-based semantics  Statistical semantics  Geometrical models of meaning  Vector semantics  Word (semantic) space models
  • 26. Definition of DSMs
  • 27. Building a DSM  Pre-process a corpus (target, context)  Count the target-context co-occurrences  Weight the contexts (optional)  Build the distributional matrix  Reduce the matrix dimensions (optional)  Parameters - Corpus - Context type - Weighting scheme - Similarity measure - Number of dimensions  A parameter configuration determines the DSM: (LSA, ESA, …)
  • 28. Parameters  Corpus pre-processing - Stemming/lemmatization - POS tagging - Syntactic Dependencies  Context - Document - Paragraph - Passage - Word windows - Words - Linguistic features - Lingustic patterns - Verbs : contexts nouns - Verbs : contexts adverbs - etc. - Size - Shape Context Engineering
  • 29. Effect of Parameters
  • 30. Context Weighting  Smoothing frequency differences: From raw counts to log- frequency.  Association measures (Evert 2005): are used to give more weight to contexts that are more significantly associated with a target word
  • 31. Context Weighting Measures Kiela & Clark, 2014
  • 32. Similarity Measures Kiela & Clark, 2014
  • 33. What is the best parameter configuration?  The best parameter configuration depends on the task.  Systematic exploration of the parameters
  • 34. DSM Instances  Latent Semantic Analysis (Landauer & Dumais 1996)  Hyperspace Analogue to Language (Lund & Burgess 1996)  Infomap NLP (Widdows 2004)  Random Indexing (Karlgren & Salhgren 2001)  Dependency Vectors (Pad´o & Lapata 2007)  Explicit Semanitc Analysis (Gabrilovich & Markovitch, 2008)  Distributional Memory (Baroni & Lenci 2009)
  • 35. Compositional Semantics
  • 36. Paraphrase Detection I find it rather odd that people are already trying to tie the Commission's hands in relation to the proposal for a directive, while at the same calling on it to present a Green Paper on the current situation with regard to optional and supplementary health insurance schemes. I find it a little strange to now obliging the Commission to a motion for a resolution and to ask him at the same time to draw up a Green Paper on the current state of voluntary insurance and supplementary sickness insurance. =?
  • 37. Compositional Semantics  Can we extend DS to account for the meaning of phrases and sentences?  Compositionality: The meaning of a complex expression is a function of the meaning of its constituent parts.
  • 38. Compositional Semantics Words in which the meaning is directly determined by their distributional behaviour (e.g., nouns). Words that act as functions transforming the distributional profile of other words (e.g., verbs, adjectives, …).
  • 39. Compositional Semantics Mixture Function
  • 40. Compositional Semantics  Take the syntactic structure to constitute the backbone guiding the assembly of the semantic representations of phrases. (CHASE × cats) × dogs. 3rd order tensor vector vector (CHASE × cats) Baroni et al., 2012
  • 41. Formal Model  Distributional Semantics & Category Theory
  • 42. Take-away message  Low acquisition effort  Simple way to build a commonsense KB  Semantic approximation as a built-in construct  Semantic best-effort  Simple to use  DSMs are evolving fast (compositional and formal grounding)  Distributional semantics brings a promising approach for building semantic models that work in the real world
  • 43. Great Introductory References  Evert & Lenci ESSLLI Tutorial on Distributional Semantics, 2009. (many slides were taken or adapted from this great tutorial).  Turney & Pantel, From Frequency to Meaning:Vector Space Models of Semantics, 2010.  Baroni et al., Frege in Space: A Program for Compositional Distributional Semantics, 2012.  Kiela & Clark: A Systematic Study of Semantic Vector Space Model Parameters, 2014.