Twitris
Browsing real-time data by space,
        time and theme
           http://twitris.knoesis.org
Motivation, Goals
Motivation, Goals
Mumbai Terror Attack 2008
  Citizen sensor observations (flickr, twitter,
  blogs..)
  No matter where y...
Spatio-Temporal-Thematic Slices of
         Real-time Data

  Around NEWS-WORTHY EVENTS
    Using space and time as cues f...
The Health Care Reform Debate
          in the U.S
The Health Care Reform Debate
           in the U.S
Temporal navigation
The Health Care Reform Debate
           in the U.S
Temporal navigation   Spatial Markers
Zooming in on Florida
n-gram Summaries
Zooming in on Washington
n-gram Summaries
Find resources related to
                                  Find resources related to
                                    ...
Core of Twitris
n-gram summaries - Spatio-temporal-thematic
           event descriptors
Architecture
      Step1 : Gathering event-
          relevant tweets


       Because tweets are not
          pre-catego...
Topical Tweets
Gathering event-specific tweets: Iran Election
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #ir...
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #ir...
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #ir...
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per...
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per...
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per...
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per...
Architecture
                        Step1 : Gathering event-
                            relevant tweets

               ...
Geo-Coordinates of Tweets
Location a tweet originates from
Location it mentions
Approximation: Poster location on Twitter
...
Architecture
                     Step1 : Gathering event-
                         relevant tweets
                     S...
Spatio-Temporal Clusters of Tweets
Because every event is different.. and we want to preserve social perceptions
         ...
Tweets in a Spatio-Temporal Cluster

   Spatio-temporal bias dictate granularity of
   processing tweets
   Mumbai Terror ...
Architecture
                        Step1 : Gathering event-
                            relevant tweets
                ...
Thematic Descriptors

An event descriptor is an n-gram
  1,2 and 3 grams
n-gram descriptors
“President Obama in trying to regain control of the

health-care debate will likely shift his pitch in ...
Thematic Descriptors
“President”   “President Obama”   “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Th...
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Th...
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Th...
Thematic Importance of an n-gram
 “President”    “President Obama”      “President Obama in”


  Exploiting Redundancy
   ...
Thematic Importance of an n-gram
  Exploiting Variability
    Big three/Big 3; Ford, GM, Chrysler, General
    Motors..
  ...
Thematic Importance of an n-gram
            #)$
                               *&'+,-('$
                                ...
focus word in the given spatio-temporal corpus. The goal is to
o measure strength of associations is to useassociated word...
ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo
b) Top 15 extracted descriptors in the US for Mumbai att...
ration for which we wish to apply the dampening factor, for exa
nt week. However, this temporal discount might not be rele...
of importance to the global presence of the descripto
ng on the event of interest, both these discounting fa
 rent spatio-...
higher-order n-
grams picked over
  lower-order n-
 grams (if same
     scores)
Top X Descriptor Tag Cloud

 Tag size proportional to enhanced STT score
Upcoming SlideShare
Loading in …5
×

Twitris

2,853 views
2,677 views

Published on

Slides describing Twitris - a system for browsing spatio-temporal-thematic slices of event-specific Twitter data.

http://twitris.knoesis.org

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,853
On SlideShare
0
From Embeds
0
Number of Embeds
1,410
Actions
Shares
0
Downloads
27
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Twitris

  1. 1. Twitris Browsing real-time data by space, time and theme http://twitris.knoesis.org
  2. 2. Motivation, Goals
  3. 3. Motivation, Goals Mumbai Terror Attack 2008 Citizen sensor observations (flickr, twitter, blogs..) No matter where you looked, tapping into a cultural perception was impossible We wanted to know what people in India were saying vs. those in Pakistan or the U.S.A
  4. 4. Spatio-Temporal-Thematic Slices of Real-time Data Around NEWS-WORTHY EVENTS Using space and time as cues for extracting social perceptions (behind signals) Summarizing hundreds and thousands of real-time observations
  5. 5. The Health Care Reform Debate in the U.S
  6. 6. The Health Care Reform Debate in the U.S Temporal navigation
  7. 7. The Health Care Reform Debate in the U.S Temporal navigation Spatial Markers
  8. 8. Zooming in on Florida
  9. 9. n-gram Summaries
  10. 10. Zooming in on Washington
  11. 11. n-gram Summaries
  12. 12. Find resources related to Find resources related to social perceptions social perceptions Browsing Real-time Data in Context News and News and Wikipedia articles Wikipedia articles toto put extracted put extracted SOYLENT GREEN and the HEALTH CARE REFORM descriptors in descriptors in context context News and Wikipedia articles to put extracted descriptors in context ✓Exploit spatio, temporal semantics for thematic aggregation Exploit spatio, temporal semantics for thematic aggregation
  13. 13. Core of Twitris n-gram summaries - Spatio-temporal-thematic event descriptors
  14. 14. Architecture Step1 : Gathering event- relevant tweets Because tweets are not pre-categorized Skip if I run out of time ..
  15. 15. Topical Tweets Gathering event-specific tweets: Iran Election
  16. 16. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran ..
  17. 17. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  18. 18. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  19. 19. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query
  20. 20. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets
  21. 21. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts
  22. 22. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts 5. Repeat from Step 3 and babysit!
  23. 23. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal ata Collection, analysis metadata of tweets and visualizing in ly Relevant Data ning citizen observations from Twitte
  24. 24. Geo-Coordinates of Tweets Location a tweet originates from Location it mentions Approximation: Poster location on Twitter profile Location: Dayton, OH (Google geocoder service, GeoDB) Location: “best place in the world” (fail!)
  25. 25. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets ta Collection, analysis and visualizing in Step3: Spatio-temporal clusters y Relevant Data
  26. 26. Spatio-Temporal Clusters of Tweets Because every event is different.. and we want to preserve social perceptions that generated this data! Long-running, world-wide events (Iran Election Protest) clusters by country and week? Short, world-wide events (Olympics) clusters by country and day? Long-running, evolving, local events (Health Care Reform Debate) clusters by state and day? Tunable parameters
  27. 27. Tweets in a Spatio-Temporal Cluster Spatio-temporal bias dictate granularity of processing tweets Mumbai Terror Attack Cluster1: Tweets from India, 08/1/08 Cluster2: Tweets from Pakistan, 08/1/08 Cluster n: Tweets from USA, 08/13/08
  28. 28. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets Step3: Spatio-temporal ta Collection, analysis andclusters visualizing in Step4: Thematic Descriptors in spatio-temporal cluster y Relevant Data
  29. 29. Thematic Descriptors An event descriptor is an n-gram 1,2 and 3 grams
  30. 30. n-gram descriptors “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” 1-grams: President, Obama, in, trying, to, regain, ... 2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... 3-grams: “President Obama in”, “Obama in trying”; “in trying to”...
  31. 31. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by:
  32. 32. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important
  33. 33. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity)
  34. 34. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)
  35. 35. Thematic Importance of an n-gram “President” “President Obama” “President Obama in” Exploiting Redundancy tfidf of n-gram (Lucene Index) amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser) amplify by fraction of non-stop words (‘going to try’)
  36. 36. Thematic Importance of an n-gram Exploiting Variability Big three/Big 3; Ford, GM, Chrysler, General Motors.. Contextually relevant words boost statistical importance #)$ *&'+,-('$ Focus word (fw) : “big three” #(1('2-$ )/%/',$ !"#$%&'(($ Associated words (awi) : ./'0$ co-occurring in spatio-temporal set of tweets
  37. 37. Thematic Importance of an n-gram #)$ *&'+,-('$ focus word (fw): Big Three #(1('2-$ !"#$%&'(($ )/%/',$ associated word (awi): Ford ./'0$ Thematic importance of focus word: tfidf of fw tfidf of awi association strength of fw and awi
  38. 38. focus word in the given spatio-temporal corpus. The goal is to o measure strength of associations is to useassociated words of the focus word only with the strongly word co-occu nguage [9]. Borrowing fromassociations is in thisword co-occure to measure strength of past success to use area, we mea rengthlanguage [9]. Borrowingwordpast success in this area, words a between the focus from and the associated we meas Contextual Relevance strength between the focus word and the associated words as he notion of point-wise mutual information in terms of co-o the notion of point-wise mutual information in terms of co-oc We measure assocstr scores as aas a function ofthe point-wisem We measure assocstr scores function of the point-wise etweenbetween the word Strengthcontextandawi .i . This is done the focus focus word and the context of awi This is done Association and the of fw of aw ssociation strengths are determined in in the contexts thatthe d association strengths are determined the contexts that the Let us depends on contexts Cawi ={caw1 ,caw ..} where caw et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk contexts for awi aw as strong descriptors collocate with awawiassoc str(f w,aw) )isis rong descriptors that that collocate with . . assoc (f w,awi c i str i Contexts of associated P (pmi(f w,caw )) word awi : ‘Ford’ assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi i )= k k )) |Cawi | !"#$%&'(($ assocstr (f w,awi )= k ,∀cawk ∈Caw |Cawi | where the point-wise mutual information between f w and ca here the i)*'+$is calculated as: aw ),point-wise mutual information between f w and c Pointwise Mutual Information wi ), is calculated big chrysler, GM, as: 3 p(f w,caw ) k p(cawk |f w) pmi(f w,cawk )=log p(f w)p(caw ) =log p(cawk ) k focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f) where p(f w)= pmi(f k |f w)= n(f w) ;p(caw p(f w,cawk n(cawk ,f w) w) ; n(f w) =log frequency p(caw N n(f w) k k
  39. 39. ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo b) Top 15 extracted descriptors in the US for Mumbai attack even ocus word and all associations in Cf w . The thematic weights of long with Temporal Importance of a1 to compu their strengths are plugged into Eqn Descriptor hematic score ngrami (th), of the n-gram descriptor. B. Temporal Importance of an event descriptor: While th re good indicators of what will always dominate Certain descriptors is important in a spatio-tempora escriptors tend to dominate discussions. In order to allow discussions ossibly interesting descriptors to surface, we discount the th “Terrorism” in Mumbai Terror Attack Tweets escriptor depending on how popular it has been in the recent p iscount score for a n-gram, a Care reform debatedepending on “Healthcare” in Health tuneable factor vent, is calculated over a period of time as: Allow recent (possibly interesting) ones to surface ngram (te)=temporal ∗ PD ngrami (th)d i bias d=1 d 0-1 bias: less to more importance here ngrami (th)d is the enhanced thematic score to recent n-grams of the descri
  40. 40. ration for which we wish to apply the dampening factor, for exa nt week. However, this temporal discount might not be relevant f ons. For this reason, we also apply a temporalbias weight ranging fr weight closer to 1 Importance of while a weight closer to 0 Spatial activity. gives more importance, a Descriptor portance to past ial Importance of an event descriptor: We also discount the im a descriptor based on its occurence in other spatio-temporal sets is that Local descriptors are more interesting compared ar descriptors that occur all over the world on a given day sting compared to those that occur only in the spatio-temporal set to global ones We define the spatial discount score for an n-gram as a fraction of sp Spatial discount artitions (e.g. countries) that had activity surrounding this descri k ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias ) fraction of spatio-temporal closer to 0 = global clusters n-gram occurred in importance
  41. 41. of importance to the global presence of the descripto ng on the event of interest, both these discounting fa rent spatio-temporal sets. For example, when processi STT Score of an n-gram Mumbai attack setting the spatialbias to 1 eliminate ial signals. While processing tweets from the US, on obal bias given that the event did not originate the are setSpatio-temporal-thematic score of aof observations before we begin the processing descriptor he spatial thematic score - spatio-temporal discountsfrom = and temporal effects are discounted final spatio-temporal-thematic (STT) weight of the n wi =ngrami (th)−ngrami (te)−ngrami (sp) illustrates the effect of our enhanced STT weights ptors pertaining to the Mumbai terror attack event,
  42. 42. higher-order n- grams picked over lower-order n- grams (if same scores)
  43. 43. Top X Descriptor Tag Cloud Tag size proportional to enhanced STT score

×