Peaks and Persistance

764 views

Published on

My CSCW2011 talk on Peaks and Persistence...understanding temporality in microblog event analysis. Full paper here: http://research.yahoo.com/pub/3419

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
764
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • And now for the quant method part of the show. Spend more time setting the context for the data. Story and cultural background is important here.\n
  • Emphasize what this is as a broadcast media event. And its a big deal.\n
  • \n
  • \n
  • \n
  • \n
  • There is a need to look beyond memes and trends. These are actually good for pulling out over-arching themes, but poor at describing the event.\n
  • \n
  • So for every 5 minutes, we ignore who tweeted what...just pool it all together.\n
  • \n
  • \n
  • \n
  • \n
  • Not everything “trends” not everything is a “spike”\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Explain this too. What is this...\n
  • \n
  • \n
  • \n
  • Its important over time, even if its not a trend or if it trends momentarily.\n\n\n
  • We’ve taken this through mechanics and techniques; this augments SNA. We are pushing the bounds of SNA in regards to media and event. The importance of an event needs to be measured over time, not just in the right here right now. We know the revolution will be something to remember. Its what happened in there that we need to pull out.\n
  • thanks\n\nReviewer 3 raises some concerns about this proposed variation of TFIDF with particular attention to why the TF metric is defined at the document level. We count the number of tweets that a term appears in (rather than raw total frequency of the term) since we are focused on gauging the number of *people* using a term at a given time. A raw count of occurrence can be biased by a single user repeating a term many times in one tweet, which we did observe. R3 also noticed that our TFIDF variant is sensitive to very infrequent terms (which is also the case with traditional TFIDF). We also observed this problem and handled it by treating terms with frequencies below a certain threshold as stop words.\n
  • Peaks and Persistance

    1. 1. Peaks & PersistenceDavid A. Shamma • Yahoo! ResearchLyndon Kennedy • Yahoo! LabsElizabeth F. Churchill • Yahoo! Research
    2. 2. The USA Presidential A set of tweets.Inauguration of 2009
    3. 3. Tweet Stream circa2009Data Mining FeedConstant Data Rate600 Tweets per minute90 minutes54,000 Tweets from 1.5 hours2M mobile streams
    4. 4. Volume by minute Inauguration 2009
    5. 5. Conversational Volume Inauguration 2009 by Minute
    6. 6. How do we begin to extract what happened andwhat people talked about?
    7. 7. “Trending Topics” &memes areinsufficientToo general. Highlightsbasically what you would wantto stoplist.
    8. 8. M. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. Chi. Eddi: InteractiveTopic-based Browsing of Social Status Streams. ICWSM, 2010.S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblogging during twonatural hazards events: what twitter may contribute to situational awareness. InCHI ’10: Proceedings of the 28th international conference on Human factors incomputing systems, pages 1079–1088, New York, NY, USA, 2010. ACM.A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understandingmicroblogging usage and communities. In WebKDD/SNA-KDD ’07: Proceedings ofthe 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and socialnetwork analysis, pages 56–65, New York, NY, USA, 2007. ACM.B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about twitter. In WOSP ’08:Proceedings of the first workshop on Online social networks, pages 19–24, NewYork, NY, USA, 2008. ACM.
    9. 9. traditional tf•idf won’t work: a tweet is a verysmall document...too small. The vector is too flat.“OMG - Obama just messed up the oath - AWESOME! he’s human!”Create pseudo-documents from 5-minute slices
    10. 10. No
significant
occurrence
of
a
 term. Term
occurs
 significantly Find terms which are only windows are 5 minute slices particular to a window
    11. 11. Find terms which are only windows are 5 minute slices particular to a window
    12. 12. No
significant
occurrence
of
 “aretha” Find terms which are only particular to a window
    13. 13. Cluster significant co-occurring terms perpseudo documentArethaFranklinSingsBowHat
    14. 14. Not everything of interest “spikes”.Not everything of interest is a “trend”.Not everything of interest is “salient”.
    15. 15. t0
=
Time
 t2
=
Time
before
Max a>er
Max t1
=
Time
at
 Max Three slices Do this for every term
    16. 16. t0 t2 { { t0
=
Time
before
 t2
=
Time
a>er
 peak peak { t_peak
=
Time
at
 MaxAverage score from t2: persistence = t2/t0Average score from t0
    17. 17. Sustained/Persisted Topics persist over time Interest
    18. 18. 0.35 0.25Terms are not sailent However they persist.
    19. 19. People announce(12:05) Bastille71: OMG - Obama just messed up the oath - AWESOME! he’s human!(12:07) ryantherobot: LOL Obama messed up his inaugural oath twice! regardless, Obama is the president today! whoooo!(12:46) mattycus: RT @deelah: it wasn’t Obama that messed the oath, it was Chief Justice Roberts: http://is.gd/gAVo(12:53) dawngoldberg: @therichbrooks He flubbed the oath because Chief Justice screwed up the order of the words.
    20. 20. People reply(12:05) Bastille71: OMG - Obama just messed up the oath - AWESOME! he’s human!(12:07) ryantherobot: LOL Obama messed up his inaugural oath twice! regardless, Obama is the president today! whoooo!(12:46) mattycus: RT @deelah: it wasn’t Obama that messed the oath, it was Chief Justice Roberts: http://is.gd/gAVo(12:53) dawngoldberg: @therichbrooks He flubbed the oath because Chief Justice screwed up the order of the words.
    21. 21. 2009 MTV VMA 2.4 million tweets
    22. 22. Persisted terms from the MTV VMA
    23. 23. Kanye : blue line from the MTV VMAa••hole : red line
    24. 24. Peaky table of contents Again, we find conversation not and “Kanye” salient
    25. 25. Peaky table of contents Again, we find conversation not and “Kanye” salient
    26. 26. What about complex events? What about human exchanges makesthings iconic?
    27. 27. Thanks! Extra thanks to my co-authors!

    ×