Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Making natural language processing
robust to sociolinguistic variation
Jacob Eisenstein
@jacobeisenstein
Georgia Institute...
Machine reading
From text to structured
representations.
Annotate
and train
Machine reading
From text to structured
representations.
Annotate
and train
Machine reading
From text to structured
representations.
New domains of digitized
texts offer opportunit...
Language data then and now
Then: news text, small set
of authors, professionally
edited, fixed style
Language data then and now
Then: news text, small set
of authors, professionally
edited, fixed style
Now: open domain,
ever...
Social media has forced
NLP to confront the
challenge of missing
social context
(Eisenstein, 2013):
(Gimpel et al., 2011)
...
Social media has forced
NLP to confront the
challenge of missing
social context
(Eisenstein, 2013):
tacit assumptions
abou...
Social media has forced
NLP to confront the
challenge of missing
social context
(Eisenstein, 2013):
tacit assumptions
abou...
Finding tacit context in the social network
Social media texts lack
context, because it is
implicit between the
writer and...
Assortativity of entity references
We project embeddings for entities, words, and
authors into a shared semantic space.
“Dirk Novitsky”
“the warriors”
Inner ...
Socially-Infused	En,ty	Linking
47
Socially-Infused	En,ty	Linking
47
tweet
en,ty	assignments
author
Socially-Infused	En,ty	Linking
47
tweet
en,ty	assignments
author
‣						is	employed	to	model	surface	features.g1
Socially-Infused	En,ty	Linking
47
tweet
en,ty	assignments
author
‣						is	used	to	capture	two	assump,ons:
‣	En,ty	homophi...
Socially-Infused	En,ty	Linking
48
g2(x, yt, u, t; ⇥2) = v(u)
u
>
W(u,e)
v(e)
yt
+ v
(m)
t
>
W(m,e)
v(e)
yt
author	embeddin...
Loss-augmented	
training
Socially-Infused	En,ty	Linking
48
g2(x, yt, u, t; ⇥2) = v(u)
u
>
W(u,e)
v(e)
yt
+ v
(m)
t
>
W(m,e...
Learning
49
Learning
49
‣	Loss-augmented	inference:
Learning
49
‣	Loss-augmented	inference: hamming	loss
Learning
49
‣	Loss-augmented	inference:
‣	Op,miza,on:	stochas,c	gradient	descent
hamming	loss
Inference
50
‣	Non-overlapping	structure
In	order	to	link	‘Red	Sox’	to	a	real	en,ty,	‘Red’	and	‘Sox’	
should	be	linked	to	...
Classifier Struct Struct+Social S-MART
64
66
68
70
72
74
76
78
F1
Dataset
NEEL
TACL
+3.2
+2.0
Structure prediction improve...
Social media has forced
NLP to confront the
challenge of missing
social context
(Eisenstein, 2013):
tacit assumptions
abou...
Social media has forced
NLP to confront the
challenge of missing
social context
(Eisenstein, 2013):
tacit assumptions
abou...
Language variation: a challenge for NLP
“I would like to believe he’s
sick rather than just mean
and evil.”
Language variation: a challenge for NLP
“I would like to believe he’s
sick rather than just mean
and evil.”
“You could’ve ...
Personalization by ensemble
Goal: personalized conditional likelihood,
P(y | x, a), where a is the author.
Problem: We hav...
Personalization by ensemble
Goal: personalized conditional likelihood,
P(y | x, a), where a is the author.
Problem: We hav...
Homophily to the rescue?
Sick!
Sick!
Sick!Sick!
Labeled
data
Unlabeled
data
Are language styles assortative on the social
...
Evidence for linguistic homophily
Pilot study: is classifier accuracy assortative on the
Twitter social network?
assort(G) ...
Evidence for linguistic homophily
Pilot study: is classifier accuracy assortative on the
Twitter social network?
assort(G) ...
Network-driven personalization
For each author, estimate
a node embedding
ea (Tang et al., 2015).
Nodes who share
neighbor...
Results
Mixture of Experts NLSE Social Personalization
0.0
0.5
1.0
1.5
2.0
2.5
3.0F1improvementoverConvNet
+0.10
+1.90
+2....
Variable sentiment words
More positive More negative
1 banging loss fever broken
fucking
dear like god yeah wow
2 chilling...
Summary
Robustness is a key challenge for making NLP effective on
social media data:
Tacit assumptions about shared knowled...
Summary
Robustness is a key challenge for making NLP effective on
social media data:
Tacit assumptions about shared knowled...
Acknowledgments
Students and collaborators:
Yi Yang (GT → Bloomberg)
Mingwei Chang (Google Research)
See https://gtnlp.wor...
References I
Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from sca...
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017
Upcoming SlideShare
Loading in …5
×

Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017

210 views

Published on

Making Natural Language Processing Robust to Sociolinguistic Variation:
Natural language processing on social media text has the potential to aggregate facts and opinions from millions of people all over the world. However, language in social media is highly variable, making it more difficult to analyze that conventional news texts. Fortunately, this variation is not random; it is often linked to social properties of the author. I will describe two machine learning methods for exploiting social network structures to make natural language processing more robust to socially-linked variation. The key idea behind both methods is linguistic homophily: the tendency of socially linked individuals to use language in similar ways. This idea is captured using embeddings of node positions in social networks. By integrating node embeddings into neural networks for language analysis, we obtained customized language processing systems for individual writers — even for individuals for whom we have no labeled data. The first application shows how to apply this idea to the problem of tweet-level sentiment analysis. The second application targets the problem of linking spans of text to known entities in a knowledge base.

Published in: Technology
  • Be the first to comment

Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georgia Institute of Technology at MLconf ATL 2017

  1. 1. Making natural language processing robust to sociolinguistic variation Jacob Eisenstein @jacobeisenstein Georgia Institute of Technology September 9, 2017
  2. 2. Machine reading From text to structured representations.
  3. 3. Annotate and train Machine reading From text to structured representations.
  4. 4. Annotate and train Machine reading From text to structured representations. New domains of digitized texts offer opportunities as well as challenges.
  5. 5. Language data then and now Then: news text, small set of authors, professionally edited, fixed style
  6. 6. Language data then and now Then: news text, small set of authors, professionally edited, fixed style Now: open domain, everyone is an author, unedited, many styles
  7. 7. Social media has forced NLP to confront the challenge of missing social context (Eisenstein, 2013): (Gimpel et al., 2011) (Ritter et al., 2011) (Foster et al., 2011)
  8. 8. Social media has forced NLP to confront the challenge of missing social context (Eisenstein, 2013): tacit assumptions about audience knowledge language variation across social groups (Gimpel et al., 2011) (Ritter et al., 2011) (Foster et al., 2011)
  9. 9. Social media has forced NLP to confront the challenge of missing social context (Eisenstein, 2013): tacit assumptions about audience knowledge language variation across social groups (Gimpel et al., 2011) (Ritter et al., 2011) (Foster et al., 2011)
  10. 10. Finding tacit context in the social network Social media texts lack context, because it is implicit between the writer and the reader. Homophily: socially connected individuals tend to share traits.
  11. 11. Assortativity of entity references
  12. 12. We project embeddings for entities, words, and authors into a shared semantic space. “Dirk Novitsky” “the warriors” Inner products in this space indicate compatibility.
  13. 13. Socially-Infused En,ty Linking 47
  14. 14. Socially-Infused En,ty Linking 47 tweet en,ty assignments author
  15. 15. Socially-Infused En,ty Linking 47 tweet en,ty assignments author ‣ is employed to model surface features.g1
  16. 16. Socially-Infused En,ty Linking 47 tweet en,ty assignments author ‣ is used to capture two assump,ons: ‣ En,ty homophily ‣ is employed to model surface features. ‣ Seman,cally related men,ons tend to refer similar en,,es g1 g2
  17. 17. Socially-Infused En,ty Linking 48 g2(x, yt, u, t; ⇥2) = v(u) u > W(u,e) v(e) yt + v (m) t > W(m,e) v(e) yt author embedding men,on embedding v(u) u v(e) yt v(e) yt v (m) t g2 (x, yt, t) g1 en,ty embedding
  18. 18. Loss-augmented training Socially-Infused En,ty Linking 48 g2(x, yt, u, t; ⇥2) = v(u) u > W(u,e) v(e) yt + v (m) t > W(m,e) v(e) yt author embedding men,on embedding v(u) u v(e) yt v(e) yt v (m) t g2 (x, yt, t) g1 en,ty embedding
  19. 19. Learning 49
  20. 20. Learning 49 ‣ Loss-augmented inference:
  21. 21. Learning 49 ‣ Loss-augmented inference: hamming loss
  22. 22. Learning 49 ‣ Loss-augmented inference: ‣ Op,miza,on: stochas,c gradient descent hamming loss
  23. 23. Inference 50 ‣ Non-overlapping structure In order to link ‘Red Sox’ to a real en,ty, ‘Red’ and ‘Sox’ should be linked to Nil.
  24. 24. Classifier Struct Struct+Social S-MART 64 66 68 70 72 74 76 78 F1 Dataset NEEL TACL +3.2 +2.0 Structure prediction improves accuracy. Social context yields further improvements. S-MART is the prior state-of-the-art (Yang & Chang, 2015).
  25. 25. Social media has forced NLP to confront the challenge of missing social context (Eisenstein, 2013): tacit assumptions about audience knowledge language variation across social groups (Gimpel et al., 2011) (Ritter et al., 2011) (Foster et al., 2011)
  26. 26. Social media has forced NLP to confront the challenge of missing social context (Eisenstein, 2013): tacit assumptions about audience knowledge language variation across social groups (Gimpel et al., 2011) (Ritter et al., 2011) (Foster et al., 2011)
  27. 27. Language variation: a challenge for NLP “I would like to believe he’s sick rather than just mean and evil.”
  28. 28. Language variation: a challenge for NLP “I would like to believe he’s sick rather than just mean and evil.” “You could’ve been getting down to this sick beat.” (Yang & Eisenstein, 2017)
  29. 29. Personalization by ensemble Goal: personalized conditional likelihood, P(y | x, a), where a is the author. Problem: We have labeled examples for only a few authors.
  30. 30. Personalization by ensemble Goal: personalized conditional likelihood, P(y | x, a), where a is the author. Problem: We have labeled examples for only a few authors. Personalization ensemble P(y | x, a) = k Pk(y | x)πa(k) Pk(y | x) is a basis model πa(·) are the ensemble weights for author a
  31. 31. Homophily to the rescue? Sick! Sick! Sick!Sick! Labeled data Unlabeled data Are language styles assortative on the social network?
  32. 32. Evidence for linguistic homophily Pilot study: is classifier accuracy assortative on the Twitter social network? assort(G) = 1 #|G| (i,j)∈G δ(yi = ˆyi)δ(yj = ˆyj) + δ(yi = ˆyi)δ(yj = ˆyj)
  33. 33. Evidence for linguistic homophily Pilot study: is classifier accuracy assortative on the Twitter social network? assort(G) = 1 #|G| (i,j)∈G δ(yi = ˆyi)δ(yj = ˆyj) + δ(yi = ˆyi)δ(yj = ˆyj) 0 20 40 60 80 100 rewiring epochs 0.700 0.705 0.710 0.715 0.720 0.725 0.730 0.735 assortativity follow 0 20 40 60 80 100 rewiring epochs mention 0 20 40 60 80 100 rewiring epochs retweet original network random rewiring
  34. 34. Network-driven personalization For each author, estimate a node embedding ea (Tang et al., 2015). Nodes who share neighbors get similar embeddings. πa =SoftMax(f (ea)) P(y | x, a) = K k=1 Pk(y | x)πa(k)
  35. 35. Results Mixture of Experts NLSE Social Personalization 0.0 0.5 1.0 1.5 2.0 2.5 3.0F1improvementoverConvNet +0.10 +1.90 +2.80 Twitter Sentiment Analysis Improvements over ConvNet baseline: +2.8% on Twitter Sentiment Analysis +2.7% on Ciao Product Reviews NLSE is prior state-of-the-art (Astudillo et al., 2015).
  36. 36. Variable sentiment words More positive More negative 1 banging loss fever broken fucking dear like god yeah wow 2 chilling cold ill sick suck satisfy trust wealth strong lmao 3 ass damn piss bitch shit talent honestly voting win clever 4 insane bawling fever weird cry lmao super lol haha hahaha 5 ruin silly bad boring dreadful lovatics wish beliebers ariana- tors kendall
  37. 37. Summary Robustness is a key challenge for making NLP effective on social media data: Tacit assumptions about shared knowledge; language variation Social metadata gives NLP systems the flexibility to handle each author differently.
  38. 38. Summary Robustness is a key challenge for making NLP effective on social media data: Tacit assumptions about shared knowledge; language variation Social metadata gives NLP systems the flexibility to handle each author differently. The long tail of rare events is the other big challenge. Word embeddings for unseen words (Pinter et al., 2017) Lexicon-based supervision (Eisenstein, 2017) Applications to finding rare events in electronic health records (ongoing work with Jimeng Sun)
  39. 39. Acknowledgments Students and collaborators: Yi Yang (GT → Bloomberg) Mingwei Chang (Google Research) See https://gtnlp.wordpress.com/ for more! Funding: National Science Foundation, National Institutes for Health, Georgia Tech
  40. 40. References I Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding sub-spaces. In Proceedings of the Association for Computational Linguistics (ACL), Beijing. Eisenstein, J. (2013). What to do about bad language on the internet. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), (pp. 359–369). Eisenstein, J. (2017). Unsupervised learning for lexicon-based classification. In Proceedings of the National Conference on Artificial Intelligence (AAAI), San Francisco. Foster, J., Cetinoglu, O., Wagner, J., Le Roux, J., Nivre, J., Hogan, D., & van Genabith, J. (2011). From news to comment: Resources and benchmarks for parsing the language of web 2.0. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), (pp. 893–901)., Chiang Mai, Thailand. Asian Federation of Natural Language Processing. Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for Twitter: annotation, features, and experiments. In Proceedings of the Association for Computational Linguistics (ACL), (pp. 42–47)., Portland, OR. Pinter, Y., Guthrie, R., & Eisenstein, J. (2017). Mimicking word embeddings using subword rnns. In Proceedings of Empirical Methods for Natural Language Processing (EMNLP). Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named entity recognition in tweets: an experimental study. In Proceedings of EMNLP. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the Conference on World-Wide Web (WWW), (pp. 1067–1077). Yang, Y. & Chang, M.-W. (2015). S-mart: Novel tree-based structured learning algorithms applied to tweet entity linking. In Proceedings of the Association for Computational Linguistics (ACL), (pp. 504–513)., Beijing. Yang, Y. & Eisenstein, J. (2017). Overcoming language variation in sentiment analysis with social attention. Transactions of the Association for Computational Linguistics (TACL), in press.

×