Identifying Content forPlanned Events AcrossSocial Media SitesHila Becker, Dan Iter, Mor Naaman, Luis Gravano
Event Content in Social Media                                2
MIKE CLARKE/AFP/Getty Images                               3
Source: Tweetsfrom Tahrir, editedby Nadia Idle andAlex Nunns                      4
5
Event Content in Social Media Challenges:  Wide variety of topics, not all related to events   (e.g., personal status up...
Event Content in Social Media                               Planned Event is a real-world occurrence withContent Discover...
Identifying Content for Planned   Events Identify planned event documents given known event information  User-contribute...
Planned Event Record  TitleDescriptio   nDate/Time Venue   City                                 9
Approach for Planned Event Content   Identification Two-step query formulation strategy  Precision-oriented queries usin...
Query Formulation Strategies:   Precision-oriented Queries Combined event record features   Phrase, bag-of-words, stop w...
Query Formulation Strategies:Precision-oriented Queries Demo                                  12
Query Formulation Strategies: Recall-   oriented Queries Generated using “high-precision” results from precision-oriented...
Query Selection Strategies Problem: potentially large set of generated queries Select top candidate queries  Specificit...
Leveraging Cross-Site Content Build precision-oriented  queries using planned event  features                            ...
Experimental Settings 60 planned events from  EventBrite, LastFM, LinkedIn, and Facebook Corresponding social media docu...
Evaluation How do our queries compare with human- generated queries for the event? How good are our queries? How good a...
How good are our queries?Would the query match documents related to the event? 1 = not likely, 5 = certainly  5 4.5  4   ...
Can our queries retrieve relevant    results? Rank retrieved results   Based on similarity to event record   Using mult...
NDCG Performance on Twitter         1       0.95                                                           Twi er-MS      ...
Cross-Site NDCG Performance       1.1         1                         4          4       0.9       0.8       5       5  ...
Conclusions Developed a two-step query-oriented solution for planned event content identification   User contributed eve...
Future Work Leverage explicit links   From event records to documents   Between documents from different social media  ...
Upcoming SlideShare
Loading in...5
×

Hila wsdm12-final

770

Published on

WSDM 2010 talk: "Identifying Content for Planned Events Across Social Media Sites"

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
770
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Users often share information about events in a variety of forms on different social media sites
  • Social media provides many challenges ad opportunities for identifying event information
  • Explain that we work in real-time (for the most part) and say we divide the space into unknown and know identification scenarios, then mention the type of even we focus on for each. Also briefly mention that as we discuss in the thesis, these are not disjoint
  • on average, queries generated by this strategy are expected to retrieve some results for their associated event.
  • This is averaged over all events that had some results. How many events had some results? Precision – 22% of events , Twitter RTR 76% of events
  • Transcript of "Hila wsdm12-final"

    1. 1. Identifying Content forPlanned Events AcrossSocial Media SitesHila Becker, Dan Iter, Mor Naaman, Luis Gravano
    2. 2. Event Content in Social Media 2
    3. 3. MIKE CLARKE/AFP/Getty Images 3
    4. 4. Source: Tweetsfrom Tahrir, editedby Nadia Idle andAlex Nunns 4
    5. 5. 5
    6. 6. Event Content in Social Media Challenges:  Wide variety of topics, not all related to events (e.g., personal status updates, every-day mundane conversations)  Unconventional text: abbreviations, typos  Large-scale, rapidly produced content Opportunities:  Content generated in real-time, as events happen  Rich context features (e.g., time, location)  Users’ perspective 6
    7. 7. Event Content in Social Media  Planned Event is a real-world occurrence withContent Discovery corresponding published event record consisting Known of:  Title, describing the subject of the event  The time at which the event is planned to occur Unknown 7
    8. 8. Identifying Content for Planned Events Identify planned event documents given known event information  User-contributed planned event records  LastFM Events  EventBrite  Facebook Events  Structured features (e.g., title, time, location) Challenging identification scenario  Known event information is often inaccurate or incomplete  Social media documents are brief and noisy 8
    9. 9. Planned Event Record TitleDescriptio nDate/Time Venue City 9
    10. 10. Approach for Planned Event Content Identification Two-step query formulation strategy  Precision-oriented queries using known event features  Recall-oriented queries using retrieved content from precision-oriented queries Leverage cross-site content  Identify event documents on each site individually  Use event documents on one site to retrieve additional event documents on a different site 10
    11. 11. Query Formulation Strategies: Precision-oriented Queries Combined event record features  Phrase, bag-of-words, stop word elimination  Examples: [“title”+”venue”], [title-no- stopwords+”city”] Restricted document creation time Why is this hard?  Specific titles: “Celebrate Brooklyn! Opening Night Gala & Concert with Andrew Bird”  General titles: “Opening Night Concert” 11
    12. 12. Query Formulation Strategies:Precision-oriented Queries Demo 12
    13. 13. Query Formulation Strategies: Recall- oriented Queries Generated using “high-precision” results from precision-oriented queries Frequency Analysis  Frequent terms in the event’s retrieved content  Infrequent terms in Web documents  Limited to 100 candidate queries Term Extraction Identify meaningful event-related concepts 13
    14. 14. Query Selection Strategies Problem: potentially large set of generated queries Select top candidate queries  Specificity: favor longer queries  Temporal profile: 120 100 80 60 40 20 0 6/7/11 6/8/11 6/9/11 6/10/11 6/11/11 6/12/11 6/13/11 [andrew bird concert] [state farm insurance] 14
    15. 15. Leveraging Cross-Site Content Build precision-oriented queries using planned event features … Use precision-oriented queries to retrieve data from:  Twitter  Flickr  YouTube Build recall-oriented queries using data from:  Each site individually  All sites collectively 15
    16. 16. Experimental Settings 60 planned events from EventBrite, LastFM, LinkedIn, and Facebook Corresponding social media documents  Retrieved from Twitter, Flickr, and YouTube  Ranked according to similarity to event record Techniques  Precision: only precision-oriented queries  MS: precision- and recall-oriented queries selected using Microsoft n-gram probability score  TR/RTR: precision- and recall-oriented queries selected using ratio of document frequency around the time of the event to document frequency in larger time window 16
    17. 17. Evaluation How do our queries compare with human- generated queries for the event? How good are our queries? How good are the results retrieved by our queries? 17
    18. 18. How good are our queries?Would the query match documents related to the event? 1 = not likely, 5 = certainly 5 4.5 4 MS 3.5 TR 3 RTR MS-TR 2.5 MS-RTR 2 Precision 1.5 1 Twi er Flickr YouTube All Precision 18
    19. 19. Can our queries retrieve relevant results? Rank retrieved results  Based on similarity to event record  Using multi-feature similarity metric (Becker et al. WSDM’10) Evaluate relevance of documents  NDCG  Averaged over all events that had some retrieved results Consider event coverage 19
    20. 20. NDCG Performance on Twitter 1 0.95 Twi er-MS 0.9 0.85NDCG 0.8 Twi er-RTR 0.75 0.7 Precision 0.65 0.6 5 10 15 20 Number of Documents k NDCG scores for top-k Twitter documents retrieved by Precision-oriented queries (Precision), and query strategies using Twitter data (Twitter-RTR, Twitter-MS). 20
    21. 21. Cross-Site NDCG Performance 1.1 1 4 4 0.9 0.8 5 5 Precision 0.7NDCG 0.6 39 36 34 34 Twi er-MS 0.5 0.4 0.3 YouTube-MS 7 0.2 9 8 8 0.1 0 0 5 10 15 20 25 Number of Documents kNDCG scores for top-k YouTube documents retrieved byPrecision-oriented queries (Precision), and query strategiesusing data from Twitter (Twitter-MS) and YouTube (YouTube MS). 21
    22. 22. Conclusions Developed a two-step query-oriented solution for planned event content identification  User contributed event records  Multiple social media sites Identified diverse event content: photos, videos, and tweets Showed how event content from one site can be used to enhance event content identification on other sites 22
    23. 23. Future Work Leverage explicit links  From event records to documents  Between documents from different social media sites Sub-event content analysis Event timeline construction 23

    ×