Mapping Tweets to Conference Talks: A
Goldmine for Semantics
Milan Stankovic, Hypios, Paris-Sorbonne, FR & Matthew Rowe, K...
On Conference We Tweet
Is there a Correspondance?
?
Why?
tweettweet talktalk
is about
Why?
tweettweet talktalk
is about
Topic 3
Topic 2
Topic 1
has topic
has topic
has topic
useruser
made
Why?
tweettweet talktalk
is about
Topic 3
Topic 2
Topic 1
has topic
has topic
has topic
useruser
made
interest ?
Why?
tweettweet talktalk
is about
useruser
made
were at the same talk ?
tweettweet
is about
useruser
made
Potential Benefits
• Digital memory
• Conference feedback
– number of tweets for a talk
– conversational aspects
– sentime...
Rich Activity Twitter Event Data
• We take Twitter archives from
TwapperKeeper
• We enrich Tweets with relevant DBPedia
co...
ESWC Dataset
• Collected during the Extended Semantic Web
Conference 2010
– Any tweets tagged with “eswc”
• 1082 tweets
• ...
Aligning Tweets with Talks
• Goal: Label tweets with talks
• Method:
– Induce a labelling function to perform alignment
– ...
Aligning Tweets with Talks
1. Feature Extraction:
@prefix swrc: <http://swrc.ontoware.org/ontology#>
@prefix swc: <http://...
Aligning Tweets with Talks
1. Feature Extraction: F1 - Immediate Resource
Leaves
@prefix swrc: <http://swrc.ontoware.org/o...
Aligning Tweets with Talks
1. Feature Extraction: F2 – 1-step Resource
Leaves
@prefix swrc: <http://swrc.ontoware.org/onto...
Aligning Tweets with Talks
1. Feature Extraction: F3 – DBPedia Concepts
@prefix swrc: <http://swrc.ontoware.org/ontology#>...
Aligning Tweets with Talks
2. Feature Vector Composition
Knowledge Acquisition Semantic
Analysis Social Web Microblogs
Exp...
Aligning Tweets with Talks
3. Inducing the Labelling Function
– Both tweets and events are provided as feature
vectors
– I...
Aligning Tweets with Talks
3. Inducing the Labelling Function: Proximity-
based Clustering
– Build a centroid vector for e...
Aligning Tweets with Talks
3. Inducing the Labelling Function: Naive Bayes
Classification
– Assigns most probably event la...
Experiments
• Dataset
– Corpus of Tweets collected during ESWC 2010
• Gold Standard Construction
– Used 3 raters to label ...
Experiments
• Evaluation Measures
– Precision: proportion of event tweets correctly
labelled
– Recall: proportion of tweet...
Results
Imagine…
Imagine user profiling
ESWC dataset, user Matthew Rowe
Imagine conference feedback
ESWC dataset
directly from Tweets
from mappings (Talks)
We Challenge You
We Challenge You!
• Beat us in mappings!
• We provide the human generated gold
stadnard mappings
• Can you find a more pre...
We Challenge You!
• you can find the gold standard data here :
http://research.hypios.com/?page_id=131
• you can find all ...
We Challenge You!
http://data.hypios.com/tweets/sparql
SELECT ?tweet ?talk WHERE {
?tweet <http://linkedevents.org/ontolog...
brought to you by
milan.stankovic@hypios.com & M.C.Rowe@open.ac.uk
November 2010, Shanghaï, China
Upcoming SlideShare
Loading in …5
×

Mapping Tweets to Conference Talks: A Goldmine for Semantics

950 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
950
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mapping Tweets to Conference Talks: A Goldmine for Semantics

  1. 1. Mapping Tweets to Conference Talks: A Goldmine for Semantics Milan Stankovic, Hypios, Paris-Sorbonne, FR & Matthew Rowe, KMI, Open University, UK
  2. 2. On Conference We Tweet
  3. 3. Is there a Correspondance? ?
  4. 4. Why? tweettweet talktalk is about
  5. 5. Why? tweettweet talktalk is about Topic 3 Topic 2 Topic 1 has topic has topic has topic useruser made
  6. 6. Why? tweettweet talktalk is about Topic 3 Topic 2 Topic 1 has topic has topic has topic useruser made interest ?
  7. 7. Why? tweettweet talktalk is about useruser made were at the same talk ? tweettweet is about useruser made
  8. 8. Potential Benefits • Digital memory • Conference feedback – number of tweets for a talk – conversational aspects – sentiment analysis • User profiling and expert finding • Trending topics
  9. 9. Rich Activity Twitter Event Data • We take Twitter archives from TwapperKeeper • We enrich Tweets with relevant DBPedia concepts using Zemanta • We rely on existing Linked Data about talks to perform the mappings.
  10. 10. ESWC Dataset • Collected during the Extended Semantic Web Conference 2010 – Any tweets tagged with “eswc” • 1082 tweets • 213 tweets enriched with concepts
  11. 11. Aligning Tweets with Talks • Goal: Label tweets with talks • Method: – Induce a labelling function to perform alignment – Labelled data = events from Web of Data – Unlabelled data = tweets ( ){ }L iii yx 1 , = ( ){ }U iix 1= YXf →:
  12. 12. Aligning Tweets with Talks 1. Feature Extraction: @prefix swrc: <http://swrc.ontoware.org/ontology#> @prefix swc: <http://data.semanticweb.org/ns/swc/ontology#> @prefix dog: <http://data.semanticweb.org> @prefix dc: <http://purl.org/dc/elements/1.1/> <http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> . <http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>
  13. 13. Aligning Tweets with Talks 1. Feature Extraction: F1 - Immediate Resource Leaves @prefix swrc: <http://swrc.ontoware.org/ontology#> @prefix swc: <http://data.semanticweb.org/ns/swc/ontology#> @prefix dog: <http://data.semanticweb.org> @prefix dc: <http://purl.org/dc/elements/1.1/> <http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> . <http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria> Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/cla udia-wagner Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/cla udia-wagner
  14. 14. Aligning Tweets with Talks 1. Feature Extraction: F2 – 1-step Resource Leaves @prefix swrc: <http://swrc.ontoware.org/ontology#> @prefix swc: <http://data.semanticweb.org/ns/swc/ontology#> @prefix dog: <http://data.semanticweb.org> @prefix dc: <http://purl.org/dc/elements/1.1/> <http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> . <http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria> http://data.semanticweb.org/person/cla udia-wagner Claudia Wagner http://data.semanticweb.org/organizati on/joanneum-research http://dbpedia.org/resource/Austria http://data.semanticweb.org/person/cla udia-wagner Claudia Wagner http://data.semanticweb.org/organizati on/joanneum-research http://dbpedia.org/resource/Austria
  15. 15. Aligning Tweets with Talks 1. Feature Extraction: F3 – DBPedia Concepts @prefix swrc: <http://swrc.ontoware.org/ontology#> @prefix swc: <http://data.semanticweb.org/ns/swc/ontology#> @prefix dog: <http://data.semanticweb.org> @prefix dc: <http://purl.org/dc/elements/1.1/> <http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> . <http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria> Http://dbpedia.org/resource/TwitterHttp://dbpedia.org/resource/Twitter Http://dbpedia.org/resource/Social_WebHttp://dbpedia.org/resource/Social_Web Http://dbpedia.org/resource/MicroblogsHttp://dbpedia.org/resource/Microblogs
  16. 16. Aligning Tweets with Talks 2. Feature Vector Composition Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/cla udia-wagner Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/cla udia-wagner knowledge acquisition semantic analysis social web microblogs exploring wisdom tweets knowledge acquisition social awareness streams wisdom messages IndexerIndexer knowledge 2 acquisition 2 semantic 1 analysis 1 social 2 web 1 microblogs 1 exploring 1 wisdom 1 tweets 1 awareness 1 streams 1 wisdom 1 messages 1
  17. 17. Aligning Tweets with Talks 3. Inducing the Labelling Function – Both tweets and events are provided as feature vectors – Induce a labelling function: Choose the most likely event (y) given the tweet (x) YXf →:
  18. 18. Aligning Tweets with Talks 3. Inducing the Labelling Function: Proximity- based Clustering – Build a centroid vector for each event • From event feature vectors – Compare each tweet vector with each centroid • Choose event (y) which is closest )),((minarg y Yy xdy µ ∈ = ∑= −= n i iixxmanhat 1 ),( µµ ( ) 2 1 ),( ∑= −= n i iixxeucl µµ
  19. 19. Aligning Tweets with Talks 3. Inducing the Labelling Function: Naive Bayes Classification – Assigns most probably event label given tweet features – Using Bayes Theorem, we write this as: ),,,|( 21maxarg n Yy xxxyPy  ∈ = ∏ ∈ ∈ ∈ = = = i i Yy n Yy n n Yy yxPyPy yPyxxxPy xxxP yPyxxxP y )|()( )()|,,,( ),,,( )()|,,,( maxarg maxarg maxarg 21 21 21   
  20. 20. Experiments • Dataset – Corpus of Tweets collected during ESWC 2010 • Gold Standard Construction – Used 3 raters to label a portion of tweet corpus • 200 tweets labelled – Took interrater agreement between raters • Using Kappa statistic – Initial Agreement was too low: 0.328 – Utilised Delphi method to improve agreement – Second round of labelling produced: 0.820
  21. 21. Experiments • Evaluation Measures – Precision: proportion of event tweets correctly labelled – Recall: proportion of tweets successfully returned for a tweet – F-measure: Harmonic mean of precision and recall • Placed emphasis of precision over recall RP RP measuref +× ××+ =− 2 2 )1( β β { }1,5.0,2.0=β
  22. 22. Results
  23. 23. Imagine…
  24. 24. Imagine user profiling ESWC dataset, user Matthew Rowe
  25. 25. Imagine conference feedback ESWC dataset directly from Tweets from mappings (Talks)
  26. 26. We Challenge You
  27. 27. We Challenge You! • Beat us in mappings! • We provide the human generated gold stadnard mappings • Can you find a more precise way to do tweet- talk mappings? • Can you find other uses? Let us know!
  28. 28. We Challenge You! • you can find the gold standard data here : http://research.hypios.com/?page_id=131 • you can find all the data (and automated mappings) here: http://data.hypios.com/tweets/sparql
  29. 29. We Challenge You! http://data.hypios.com/tweets/sparql SELECT ?tweet ?talk WHERE { ?tweet <http://linkedevents.org/ontology/illustrate> ?talk. }
  30. 30. brought to you by milan.stankovic@hypios.com & M.C.Rowe@open.ac.uk November 2010, Shanghaï, China

×