Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Statistically Significant Detection of Linguistic Change

511 views

Published on

Slides for the talk "Statistically Significant Detection of Linguistic Change" presented at WWW 2015, Florence, Italy

Published in: Science
  • Be the first to comment

Statistically Significant Detection of Linguistic Change

  1. 1. Statistically Significant Detection of Linguistic Change Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena 1
  2. 2. Linguistic Change Language evolves over time. Words can acquire or change meaning. 2 1900 2000 Battle Of Otterbourne
  3. 3. 3 Tracking Changes in Word Meaning
  4. 4. Why Detecting Linguistic Change Matters ?  Focus less on keywords  Intended meaning of words  Semantic Search  Google’s Humming Bird Algorithm  Powerset (bought by Microsoft) Tracking and detecting linguistic change, key to Semantic Web and Search Applications 4 2011 2012
  5. 5. Effectively capture word semantics over time using Word Embeddings 5 Talk in a nut shell Detect when and whether a change is significant. Results on Twitter, Google Book Ngrams etc. Project Code available from: http://www.vivekkulkarni.net
  6. 6. Results – A quick preview 6 SOURCE WORD ESTIMATED CHANGE POINT PAST USAGE CURRENT USAGE GOOGLE BOOK NGRAMS tape 1970 red-tape , tape from her mouth A copy of the tape gay 1985 Happy and gay Gay and lesbians sex 1965 Of the fair sex To have sex with plastic 1950 Of plastic possibilities Put in a plastic TWITTER Candy April 2013 Candy sweets Candy crush (the game) snap Dec 2012 Snap a picture Snap chat mystery Dec 2012 Mystery books Mystery Manor (a game)
  7. 7. Detecting Linguistic Change – How we did it  Tracking and detecting linguistic change in a word’s usage is really the problem of Constructing a time series capturing word’s usage Analyzing the time series for statistically significant changes (Change point Detection) 7
  8. 8. Talk outline  Different methods to model word evolution as a time series Frequency Syntactic Distributional  Method to establish statistical significance of changes.  Results on several datasets of online content like Twitter 8
  9. 9. Outline  Different methods to model word evolution as a time series Frequency Syntactic Distributional  Method to establish statistical significance of changes.  Results on several datasets of online content like Twitter 9
  10. 10. Using Frequency  Frequency based approaches to capture word usage widely used  Google Trends  Google NGrams [Jean-Baptiste Michel et.al, 2011]  Given a word w, we construct the time series as where ∁ 𝑡 is the corpus at time t. 𝑇𝑡 𝑤 = #(𝑤 ∈ ∁ 𝑡) 𝐶𝑡 10 Time series for gay
  11. 11. When Frequency Fails Sandy Hurricane Sandy changed meaning but Hurricane did not 11
  12. 12. A Second Approach – Syntactic Method  Part of Speech changes indicative of linguistic shift  Google Syntactic Ngram Viewer [Jason Mann et.al 2014, Goldberg et.al 2013]  Each word is tagged with its Part of Speech (POS)  happy and sad ADJ CC ADJ  Construct a time series by tracking changes in POS Distribution 12 𝑄𝑡 = 𝑃𝑟𝑋~𝑃𝑂𝑆𝑇𝑎𝑔𝑠 𝑋 𝑤, 𝐶𝑡 𝑇𝑡 𝑤 = 𝐽𝑆𝐷𝑖𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 𝑄0, 𝑄𝑡
  13. 13. apple 13 Syntactic Method - An Example
  14. 14. sex Captures only syntactic change 14 Syntactic Method-When it fails
  15. 15. Distributional Method-Overview 15 Learn word representations for each time point Align word embeddings Construct time series 1 2 3
  16. 16. Distributional Method-Overview 16 Learn word embeddings for each time point Align word embeddings Construct time series 1 2 3
  17. 17. Learning word representations (embeddings) 17 flower rose daisy bird canary Robin flower 0 10 20 1 1 1 rose 10 0 15 0 0 0 daisy 20 15 0 1 2 3 bird 1 0 1 0 20 40 canary 1 0 2 20 0 10 robin 1 0 3 40 10 0 [Rumelhart+, 2003] • Learning a representation is learning a mapping -- φ: 𝒱 → ℛ 𝒹 • Capture syntactic and semantic aspects of word usage • Advantages: Very effective on NLP Tasks, scalable and online methods.
  18. 18. Skipgram Model – Learning Word Embeddings 18 Can learn word representations by back-propagating errors Predict surrounding words of every word Context word Current word Vector for wI Objective
  19. 19. Using Word Embeddings To Detect Linguistic Change – Key Idea  Train word embeddings for each time point  We use Skipgram model [Mikolov 2013] to train word embeddings. 1900 19801920 1950 1990 2000 • Track displacement of a word over time in this latent space 19 19801920 1950 1990 20001900 Distance
  20. 20. But … a road block Need to align word embeddings from different vector spaces ! 20 Cannot compare word embeddings from different time points because they lie in different vector spaces.
  21. 21. Distributional Method-Overview 21 Learn word representations for each time point Align word embeddings Construct time series 1 2 3
  22. 22. Aligning Word Embeddings – Assumptions Local structure preserved as most words did not change over time 22 Local structure between vector spaces equivalent under a linear transformation.
  23. 23. Aligning Word Embeddings – Main Idea Learn a linear transformation W that attempts to preserve local structure : Use piece-wise linear regression using only k-Nearest Neighbors 23
  24. 24. Distributional Method-Overview 24 Learn word representations for each time point Align word embeddings Construct time series 1 2 3
  25. 25. Distributional Method- Constructing Time series  Align embeddings to joint space using piece-wise linear regression model  Can now induce a distance measure over the embeddings  Construct the time series for w as Recall: Cosine Distance between 2 vectors is 0 if they are equal. Higher values indicate greater distance. 𝑇𝑡(𝑤) = 𝐶𝑜𝑠𝑖𝑛𝑒𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑣0, 𝑣 𝑡) 25
  26. 26. Example: Distributional Time series gay 26
  27. 27. Outline  Different methods to model word evolution as a time series Frequency Syntactic Distributional  Method to establish statistical significance of changes.  Results on several datasets of online content like Twitter 27
  28. 28. Track changes in Mean, Variance or perhaps both Track a test statistic at each time point Eg. Difference in mean between left and right end of time series Cumulative Sum (CUSUM) Use some notion of significance to establish whether test statistic at time point t indicates a significant shift and hence a change point. Label the most significant shift as the change point. Change Point Detection in Time Series 28
  29. 29. Change Point Detection Based on Mean Shift Model described in [Taylor]29
  30. 30. Outline  Different methods to model word evolution as a time series Frequency Syntactic Distributional  Method to establish statistical significance of changes.  Results on several datasets of online content like Twitter 30
  31. 31. Popular words detected by word embeddings(Google Book NGrams) WORD PVALUE ESTIMATED CHANGE POINT PAST USAGE CURRENT USAGE tape < 0.0001 1970 red-tape , tape from her mouth A copy of the tape gay 0.0001 1985 Happy and gay Gay and lesbians sex 0.0002 1965 Of the fair sex To have sex with checking 0.0002 1970 Then checking himself Checking him out peck 0.0004 1935 Brewed a peck A peck on the cheek plastic 0.0005 1950 Of plastic possibilities Put in a plastic diet 0.0104 1970 Diet of bread and butter To go on a diet honey 0.02 1930 Land of milk and honey Oh honey ! 31
  32. 32. Popular words detected by POS (Google Book NGrams) WORD ESTIMATED CHANGE POINT REASON apple 1984 NOUN TO PROPER NOUN windows 1992 NOUN TO PROPER NOUN bush 1989 NOUN TO PROPER NOUN click 1952 NOUN TO VERB handle 1951 NOUN TO VERB sink 1972 VERB TO NOUN 32
  33. 33. Popular words detected by word embeddings - Twitter WORD ESTIMATED CHANGE POINT PAST USAGE CURRENT USAGE candy April 2013 Candy sweets Candy Crush (the game) rally March 2013 Political rally Rally of soldiers (The Immortalis Game) snap December 2012 Snap a picture Snap chat mystery December 2012 Mystery books Mystery Manor (the game) shades June 2012 Color shades, shaded glasses 50 shades of grey 33
  34. 34. Summary • Looked at the problem of detecting and tracking linguistic change • A meta approach to detect and track such linguistic changes by constructing a time series • Demonstrated how to use word embeddings to detect linguistic change. • Change point detection and estimation • Results on Google Ngrams, Twitter, Amazon Movie Reviews. • Project Code available from: http://www.vivekkulkarni.net 34

×