SlideShare a Scribd company logo
Methodology

Results

Conclusion

Unsupervised Word Usage Similarity in Social
Media Texts
Spandana Gella and Paul Cook and Bo Han
Department of Computing and Information Systems
The University of Melbourne

1 / 18
Methodology

Results

Conclusion

Social media — Twitter

Huge volume of user generated text
Applications: trend analysis, event detection, natural disaster
response co-ordination

Short, noisy text: Challenging for traditional NLP
Little work to-date on lexical semantics on Twitter
This paper: Lexical semantic interpretation in tweets based on
word usage similarity

2 / 18
Methodology

Results

Conclusion

Word sense disambiguation (WSD)

Given a word in context, select the best-fitting “sense” from a
sense inventory
ne1 headin to blue boyz footy match this weekend?
#iamcarlton
game in which players or teams compete against each other

I think they are a perfect match! #cute #xoxo
something that resembles or harmonizes; “that tie makes a
good match with your jacket”
a pair of people who live together; “a married couple from
Chicago”

3 / 18
Methodology

Results

Conclusion

Word sense disambiguation (WSD)

Given a word in context, select the best-fitting “sense” from a
sense inventory
ne1 headin to blue boyz footy match this weekend?
#iamcarlton
game in which players or teams compete against each other

I think they are a perfect match! #cute #xoxo
something that resembles or harmonizes; “that tie makes a
good match with your jacket”
a pair of people who live together; “a married couple from
Chicago”

4 / 18
Methodology

Results

Conclusion

Issues with WSD

Choice of sense inventory
match: WordNet 9 senses, Macmillan 4 senses

No sense tagged resources for social media data
Cannot capture novel usage patterns
Assumes a single sense per usage
Challenges over social media: short, noisy, non-standard
syntax
Solution? An alternative representation of meaning in context

5 / 18
Methodology

Results

Conclusion

Usage similarity (Usim)
The manual task of rating the similarity of a pair of usages of
a word (SPair) [Erk et al., 2009].
Similarity on a graded scale (1 – 5)
No more senses; independent of sense inventory
Novel usages: Rate similarity to established usages
SPair example (annotators’ judgement: 3.2)
Setting goals for myself this year, figured if it’s on paper I’ll
be more inspired to work harder.
This is very unsmart of me to get tipsy and then have to go
home and write a paper.

6 / 18
Methodology

Results

Conclusion

Gold standard — Usim-tweet annotation
10 nouns: bar, charge, execution, field, figure, function,
investigator, match, paper, post
55 pairs of tweets (SPairs) were annotated per lemma
(sampled from Twitter Streaming API)
Amazon Mechanical Turk annotation
Sample Annotation:

7 / 18
Methodology

Results

Conclusion

Background corpora
Original: Tweets from Streaming API containing target
word as noun
gotta 3 page paper due tomorrow haven start
#procrastination

Expanded: Original + document expansion based on
medium-frequency hashtags
research paper due tomorrow 2 page 3 #procrastination
insert figure equations lab report less time word #physicsmajor
#procrastination ...

RandExpanded: Original + extra tweets containing
target

8 / 18
Methodology

Results

Conclusion

Model Usim for Twitter

No large annotated training resources: unsupervised
No parser: bag-of-words
Methods?
Vector space model (VSM)
Topic models (LDA) [Lui et al., 2012]
Weighted Textual Matrix Factorization (WTMF)

1 model per target word

9 / 18
Methodology

Results

Conclusion

Methods

VSM (Baseline): Second order co-occurrence
LDA (Our approach): Represent documents as topic
distribution vectors
WTMF (Benchmark): Consider information from “missing”
words related to the latent vector profile
State-of-the-art on a similar task

10 / 18
Methodology

Results

Conclusion

Topic modeling — LDA

11 / 18
Methodology

Results

Conclusion

Method overview

12 / 18
Methodology

Results

Conclusion

Results

Model
Baseline
WTMF (d)
LDA (T )

Original
0.09
0.03 (8)
0.20 (8)

Expanded
0.08
0.10 (20)
0.29 (5)

RandExpanded
0.09
0.09 (5)
0.18 (20)

Spearman’s rank correlation (ρ) for each method based on each
background corpus

13 / 18
Methodology

Results

Conclusion

Results on Expanded
Lemma
bar
charge
execution
field
figure
function
investigator
match
paper
post
Overall

Best-T
ρ
T
0.35
50
0.33
20
0.58
5
0.53
10
0.24 250
0.40
10
0.50
5
0.45
5
0.32
30
0.2
30
0.29
5

5-topics
ρ
0.1
−0.08
0.58
0.32
0.14
0.27
0.50
0.45
0.22
−0.01
0.29

ρ values that are significant (p > 0.05) are shown in bold
14 / 18
Methodology

Results

Conclusion

Results varying d and T

0.3
0.2
0.1

Spearman correlation ρ

0.0

q
q

q

q

q

q

q

q

q

q

q

Original
Expanded
RandExpanded

q

Spearman correlation ρ
0.0
0.1
0.2
0.3

Original
Expanded
RandExpanded

q

q

q

q
q
q
q
q

q

q

q

q

d

(a) WTMF: ρ versus dimensions (d)

q

q

−0.1

q

q

q

2
3
5
8
10
20
30
50
100
150
200
250
300
350
400
450
500

200

150

50

100

30

20

8

10

5

3

2

−0.1

q

T

(b) LDA: ρ versus topics (T )

15 / 18
Methodology

Results

Conclusion

Summary

Computationally modeled Usim over social media
LDA approach out-performed a baseline and benchmark
Hashtag based document expansion improved performance of
LDA and benchmark
Gold-standard dataset Usim-tweet will be made available
Future work
Alternative document expansion (e.g., author based)
Context representation: POS, positional word features, etc.
Non-parametric topic modelling (e.g., HDP)

16 / 18
Methodology

Results

Conclusion

Thanks

17 / 18
Methodology

Results

Conclusion

Bibliography

Erk, K., McCarthy, D., and Gaylord, N. (2009).
Investigations on word senses and word usages.
In Proceedings of the Joint conference of the 47th Annual Meeting of the
Association for Computational Linguistics and the 4th International Joint
Conference on Natural Language Processing of the Asian Federation of
Natural Language Processing (ACL-IJCNLP 2009), pages 10–18,
Singapore.
Lui, M., Baldwin, T., and McCarthy, D. (2012).
Unsupervised estimation of word usage similarity.
In Proceedings of the Australasian Language Technology Workshop 2012
(ALTW 2012), pages 33–41, Dunedin, New Zealand.

18 / 18

More Related Content

What's hot

Entity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problemsEntity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problems
Yuliya Rubtsova
 
20051128.doc
20051128.doc20051128.doc
20051128.doc
butest
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
Aryak Sengupta
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation Extraction
Alexander Panchenko
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
ICDEcCnferenece
 
Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?
jeansoulin
 
Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems
dannyijwest
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
Surya Sg
 
Document
DocumentDocument
In-text Citations - APA 6th ed
In-text Citations - APA 6th edIn-text Citations - APA 6th ed
In-text Citations - APA 6th ed
Janice Orcutt
 
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
ijiert bestjournal
 
Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317
StephanieLeBadezet
 
Question 1 describe the series of connections that would be m
Question 1 describe the series of connections that would be mQuestion 1 describe the series of connections that would be m
Question 1 describe the series of connections that would be m
DIPESH30
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 
Workshop unpad2014 with ref
Workshop unpad2014 with refWorkshop unpad2014 with ref
Workshop unpad2014 with ref
Lola Devung
 
Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...
Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...
Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Studying Software Quality Using Topic Models
Studying Software Quality Using Topic ModelsStudying Software Quality Using Topic Models
Studying Software Quality Using Topic Models
SAIL_QU
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
Aravindharamanan S
 

What's hot (19)

Entity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problemsEntity-oriented sentiment analysis of tweets: results and problems
Entity-oriented sentiment analysis of tweets: results and problems
 
20051128.doc
20051128.doc20051128.doc
20051128.doc
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Similarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation ExtractionSimilarity Measures for Semantic Relation Extraction
Similarity Measures for Semantic Relation Extraction
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
 
Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?Does Data Quality lays in facts, or in acts?
Does Data Quality lays in facts, or in acts?
 
Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Document
DocumentDocument
Document
 
In-text Citations - APA 6th ed
In-text Citations - APA 6th edIn-text Citations - APA 6th ed
In-text Citations - APA 6th ed
 
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
 
Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317
 
Question 1 describe the series of connections that would be m
Question 1 describe the series of connections that would be mQuestion 1 describe the series of connections that would be m
Question 1 describe the series of connections that would be m
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Workshop unpad2014 with ref
Workshop unpad2014 with refWorkshop unpad2014 with ref
Workshop unpad2014 with ref
 
Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...
Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...
Factors Influencing Customers’ Intention to Use Instant Messaging to Communic...
 
Studying Software Quality Using Topic Models
Studying Software Quality Using Topic ModelsStudying Software Quality Using Topic Models
Studying Software Quality Using Topic Models
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 

Similar to Unsupervised Word Usage Similarity in Social Media Texts

A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
gerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
gerogepatton
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
Bhaskar Chatterjee
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
Saman Sara
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
University of Minnesota, Duluth
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomies
dermotte
 
The Comparison and Contrast Block Comparison Essay TemplateThe B.docx
The Comparison and Contrast Block Comparison Essay TemplateThe B.docxThe Comparison and Contrast Block Comparison Essay TemplateThe B.docx
The Comparison and Contrast Block Comparison Essay TemplateThe B.docx
rtodd643
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
ijcsit
 
The Duet model
The Duet modelThe Duet model
The Duet model
Bhaskar Mitra
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
IJECEIAES
 
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHSAUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
cscpconf
 
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHSAUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
csandit
 
please help i have 40 minsItem 1In the case below, the original .pdf
please help i have 40 minsItem 1In the case below, the original .pdfplease help i have 40 minsItem 1In the case below, the original .pdf
please help i have 40 minsItem 1In the case below, the original .pdf
santanadenisesarin13
 
NAACL2015 presentation
NAACL2015 presentationNAACL2015 presentation
NAACL2015 presentation
Han Xu, PhD
 
Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...
Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...
Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...
Michele Thomas
 
What forty years_of_research_says_about__the_impact_of_technology_on_learning...
What forty years_of_research_says_about__the_impact_of_technology_on_learning...What forty years_of_research_says_about__the_impact_of_technology_on_learning...
What forty years_of_research_says_about__the_impact_of_technology_on_learning...
Cathy Cavanaugh
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
Manish Parihar
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
CITE
 

Similar to Unsupervised Word Usage Similarity in Social Media Texts (20)

A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomies
 
The Comparison and Contrast Block Comparison Essay TemplateThe B.docx
The Comparison and Contrast Block Comparison Essay TemplateThe B.docxThe Comparison and Contrast Block Comparison Essay TemplateThe B.docx
The Comparison and Contrast Block Comparison Essay TemplateThe B.docx
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSTEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHSAUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
 
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHSAUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
 
please help i have 40 minsItem 1In the case below, the original .pdf
please help i have 40 minsItem 1In the case below, the original .pdfplease help i have 40 minsItem 1In the case below, the original .pdf
please help i have 40 minsItem 1In the case below, the original .pdf
 
NAACL2015 presentation
NAACL2015 presentationNAACL2015 presentation
NAACL2015 presentation
 
Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...
Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...
Analyzing The Organization Of Collaborative Math Problem-Solving In Online Ch...
 
What forty years_of_research_says_about__the_impact_of_technology_on_learning...
What forty years_of_research_says_about__the_impact_of_technology_on_learning...What forty years_of_research_says_about__the_impact_of_technology_on_learning...
What forty years_of_research_says_about__the_impact_of_technology_on_learning...
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsTowards Automatic Analysis of Online Discussions among Hong Kong Students
Towards Automatic Analysis of Online Discussions among Hong Kong Students
 

Recently uploaded

CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...
CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...
CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...
AJHSSR Journal
 
Top Google Tools for SEO: Enhance Your Website's Performance
Top Google Tools for SEO: Enhance Your Website's PerformanceTop Google Tools for SEO: Enhance Your Website's Performance
Top Google Tools for SEO: Enhance Your Website's Performance
Elysian Digital Services Pvt. Ltd.
 
一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理
一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理
一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理
wozek1
 
ChatGPT 4o for social media step by step Guide.pdf
ChatGPT 4o for social media step by step Guide.pdfChatGPT 4o for social media step by step Guide.pdf
ChatGPT 4o for social media step by step Guide.pdf
almutabbil
 
一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理
一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理
一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理
anubug
 
Using Playlists to Increase YouTube Watch Time
Using Playlists to Increase YouTube Watch TimeUsing Playlists to Increase YouTube Watch Time
Using Playlists to Increase YouTube Watch Time
SocioCosmos
 
UR BHATTI ACADEMY AND ONLINE COURSES.pdf
UR BHATTI ACADEMY AND ONLINE COURSES.pdfUR BHATTI ACADEMY AND ONLINE COURSES.pdf
UR BHATTI ACADEMY AND ONLINE COURSES.pdf
urbhattiacademy
 
SOCIAL MEDIA MARKETING agency and service
SOCIAL MEDIA MARKETING agency and serviceSOCIAL MEDIA MARKETING agency and service
SOCIAL MEDIA MARKETING agency and service
viralbusinessmarketi
 
On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...
On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...
On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...
AJHSSR Journal
 
一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理
一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理
一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理
exqfuhe
 
快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样
快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样
快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样
9u4xjk4w
 

Recently uploaded (11)

CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...
CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...
CYBER SECURITY ENHANCEMENT IN NIGERIA. A CASE STUDY OF SIX STATES IN THE NORT...
 
Top Google Tools for SEO: Enhance Your Website's Performance
Top Google Tools for SEO: Enhance Your Website's PerformanceTop Google Tools for SEO: Enhance Your Website's Performance
Top Google Tools for SEO: Enhance Your Website's Performance
 
一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理
一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理
一比一原版(UCSD毕业证)加利福尼亚大学圣迭戈分校毕业证如何办理
 
ChatGPT 4o for social media step by step Guide.pdf
ChatGPT 4o for social media step by step Guide.pdfChatGPT 4o for social media step by step Guide.pdf
ChatGPT 4o for social media step by step Guide.pdf
 
一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理
一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理
一比一原版(AU毕业证)英国阿伯丁大学毕业证如何办理
 
Using Playlists to Increase YouTube Watch Time
Using Playlists to Increase YouTube Watch TimeUsing Playlists to Increase YouTube Watch Time
Using Playlists to Increase YouTube Watch Time
 
UR BHATTI ACADEMY AND ONLINE COURSES.pdf
UR BHATTI ACADEMY AND ONLINE COURSES.pdfUR BHATTI ACADEMY AND ONLINE COURSES.pdf
UR BHATTI ACADEMY AND ONLINE COURSES.pdf
 
SOCIAL MEDIA MARKETING agency and service
SOCIAL MEDIA MARKETING agency and serviceSOCIAL MEDIA MARKETING agency and service
SOCIAL MEDIA MARKETING agency and service
 
On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...
On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...
On Storytelling & Magic Realism in Rushdie’s Midnight’s Children, Shame, and ...
 
一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理
一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理
一比一原版(CSULB毕业证)加州州立大学长滩分校毕业证如何办理
 
快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样
快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样
快速办理(worcester毕业证书)伍斯特大学毕业证PDF成绩单一模一样
 

Unsupervised Word Usage Similarity in Social Media Texts

  • 1. Methodology Results Conclusion Unsupervised Word Usage Similarity in Social Media Texts Spandana Gella and Paul Cook and Bo Han Department of Computing and Information Systems The University of Melbourne 1 / 18
  • 2. Methodology Results Conclusion Social media — Twitter Huge volume of user generated text Applications: trend analysis, event detection, natural disaster response co-ordination Short, noisy text: Challenging for traditional NLP Little work to-date on lexical semantics on Twitter This paper: Lexical semantic interpretation in tweets based on word usage similarity 2 / 18
  • 3. Methodology Results Conclusion Word sense disambiguation (WSD) Given a word in context, select the best-fitting “sense” from a sense inventory ne1 headin to blue boyz footy match this weekend? #iamcarlton game in which players or teams compete against each other I think they are a perfect match! #cute #xoxo something that resembles or harmonizes; “that tie makes a good match with your jacket” a pair of people who live together; “a married couple from Chicago” 3 / 18
  • 4. Methodology Results Conclusion Word sense disambiguation (WSD) Given a word in context, select the best-fitting “sense” from a sense inventory ne1 headin to blue boyz footy match this weekend? #iamcarlton game in which players or teams compete against each other I think they are a perfect match! #cute #xoxo something that resembles or harmonizes; “that tie makes a good match with your jacket” a pair of people who live together; “a married couple from Chicago” 4 / 18
  • 5. Methodology Results Conclusion Issues with WSD Choice of sense inventory match: WordNet 9 senses, Macmillan 4 senses No sense tagged resources for social media data Cannot capture novel usage patterns Assumes a single sense per usage Challenges over social media: short, noisy, non-standard syntax Solution? An alternative representation of meaning in context 5 / 18
  • 6. Methodology Results Conclusion Usage similarity (Usim) The manual task of rating the similarity of a pair of usages of a word (SPair) [Erk et al., 2009]. Similarity on a graded scale (1 – 5) No more senses; independent of sense inventory Novel usages: Rate similarity to established usages SPair example (annotators’ judgement: 3.2) Setting goals for myself this year, figured if it’s on paper I’ll be more inspired to work harder. This is very unsmart of me to get tipsy and then have to go home and write a paper. 6 / 18
  • 7. Methodology Results Conclusion Gold standard — Usim-tweet annotation 10 nouns: bar, charge, execution, field, figure, function, investigator, match, paper, post 55 pairs of tweets (SPairs) were annotated per lemma (sampled from Twitter Streaming API) Amazon Mechanical Turk annotation Sample Annotation: 7 / 18
  • 8. Methodology Results Conclusion Background corpora Original: Tweets from Streaming API containing target word as noun gotta 3 page paper due tomorrow haven start #procrastination Expanded: Original + document expansion based on medium-frequency hashtags research paper due tomorrow 2 page 3 #procrastination insert figure equations lab report less time word #physicsmajor #procrastination ... RandExpanded: Original + extra tweets containing target 8 / 18
  • 9. Methodology Results Conclusion Model Usim for Twitter No large annotated training resources: unsupervised No parser: bag-of-words Methods? Vector space model (VSM) Topic models (LDA) [Lui et al., 2012] Weighted Textual Matrix Factorization (WTMF) 1 model per target word 9 / 18
  • 10. Methodology Results Conclusion Methods VSM (Baseline): Second order co-occurrence LDA (Our approach): Represent documents as topic distribution vectors WTMF (Benchmark): Consider information from “missing” words related to the latent vector profile State-of-the-art on a similar task 10 / 18
  • 13. Methodology Results Conclusion Results Model Baseline WTMF (d) LDA (T ) Original 0.09 0.03 (8) 0.20 (8) Expanded 0.08 0.10 (20) 0.29 (5) RandExpanded 0.09 0.09 (5) 0.18 (20) Spearman’s rank correlation (ρ) for each method based on each background corpus 13 / 18
  • 14. Methodology Results Conclusion Results on Expanded Lemma bar charge execution field figure function investigator match paper post Overall Best-T ρ T 0.35 50 0.33 20 0.58 5 0.53 10 0.24 250 0.40 10 0.50 5 0.45 5 0.32 30 0.2 30 0.29 5 5-topics ρ 0.1 −0.08 0.58 0.32 0.14 0.27 0.50 0.45 0.22 −0.01 0.29 ρ values that are significant (p > 0.05) are shown in bold 14 / 18
  • 15. Methodology Results Conclusion Results varying d and T 0.3 0.2 0.1 Spearman correlation ρ 0.0 q q q q q q q q q q q Original Expanded RandExpanded q Spearman correlation ρ 0.0 0.1 0.2 0.3 Original Expanded RandExpanded q q q q q q q q q q q q d (a) WTMF: ρ versus dimensions (d) q q −0.1 q q q 2 3 5 8 10 20 30 50 100 150 200 250 300 350 400 450 500 200 150 50 100 30 20 8 10 5 3 2 −0.1 q T (b) LDA: ρ versus topics (T ) 15 / 18
  • 16. Methodology Results Conclusion Summary Computationally modeled Usim over social media LDA approach out-performed a baseline and benchmark Hashtag based document expansion improved performance of LDA and benchmark Gold-standard dataset Usim-tweet will be made available Future work Alternative document expansion (e.g., author based) Context representation: POS, positional word features, etc. Non-parametric topic modelling (e.g., HDP) 16 / 18
  • 18. Methodology Results Conclusion Bibliography Erk, K., McCarthy, D., and Gaylord, N. (2009). Investigations on word senses and word usages. In Proceedings of the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), pages 10–18, Singapore. Lui, M., Baldwin, T., and McCarthy, D. (2012). Unsupervised estimation of word usage similarity. In Proceedings of the Australasian Language Technology Workshop 2012 (ALTW 2012), pages 33–41, Dunedin, New Zealand. 18 / 18