SlideShare a Scribd company logo
Twitris
Browsing real-time data by space,
        time and theme
           http://twitris.knoesis.org
Motivation, Goals
Motivation, Goals
Mumbai Terror Attack 2008
  Citizen sensor observations (flickr, twitter,
  blogs..)
  No matter where you looked, tapping into a
  cultural perception was impossible

We wanted to know what people in India
were saying vs. those in Pakistan or the
U.S.A
Spatio-Temporal-Thematic Slices of
         Real-time Data

  Around NEWS-WORTHY EVENTS
    Using space and time as cues for extracting
    social perceptions (behind signals)
    Summarizing hundreds and thousands of
    real-time observations
The Health Care Reform Debate
          in the U.S
The Health Care Reform Debate
           in the U.S
Temporal navigation
The Health Care Reform Debate
           in the U.S
Temporal navigation   Spatial Markers
Zooming in on Florida
n-gram Summaries
Zooming in on Washington
n-gram Summaries
Find resources related to
                                  Find resources related to
                                      social perceptions
                                     social perceptions




   Browsing Real-time Data in Context
                                                                 News and
                                                               News and
                                                                 Wikipedia articles
                                                               Wikipedia articles
                                                               toto put extracted
                                                                  put extracted
        SOYLENT GREEN and the HEALTH CARE REFORM                 descriptors in
                                                               descriptors in
                                                                 context
                                                               context




    News and
    Wikipedia articles
    to put extracted
    descriptors in
    context




✓Exploit spatio, temporal semantics for thematic aggregation
  Exploit spatio, temporal semantics for thematic aggregation
Core of Twitris
n-gram summaries - Spatio-temporal-thematic
           event descriptors
Architecture
      Step1 : Gathering event-
          relevant tweets


       Because tweets are not
          pre-categorized



                   Skip if I run out of time ..
Topical Tweets
Gathering event-specific tweets: Iran Election
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..

2: Google insights to expand hashtag list
Topical Tweets
 Gathering event-specific tweets: Iran Election
1: Pick trending hashtags from Twitter -
#iranelection; #iran ..

2: Google insights to expand hashtag list
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
               Check for topic drifts
Topical Tweets

3. Issue a Twitter Search (API) every 30 seconds
for every hashtag, keyword
               1500 tweets per query

4. Obtain other Hashtags in crawled tweets
               Check for topic drifts

5. Repeat from Step 3 and babysit!
Architecture
                        Step1 : Gathering event-
                            relevant tweets

                       Step2: Spatial, Temporal
ata Collection,   analysis metadata of tweets
                           and visualizing         in


ly Relevant Data
ning citizen observations from Twitte
Geo-Coordinates of Tweets
Location a tweet originates from
Location it mentions
Approximation: Poster location on Twitter
profile


  Location: Dayton, OH (Google geocoder service, GeoDB)
  Location: “best place in the world” (fail!)
Architecture
                     Step1 : Gathering event-
                         relevant tweets
                     Step2: Spatial, Temporal
                        metadata of tweets
ta Collection, analysis and visualizing in
                      Step3: Spatio-temporal
                             clusters

y Relevant Data
Spatio-Temporal Clusters of Tweets
Because every event is different.. and we want to preserve social perceptions
                         that generated this data!

     Long-running, world-wide events (Iran Election Protest)
         clusters by country and week?
     Short, world-wide events (Olympics)
         clusters by country and day?
     Long-running, evolving, local events (Health Care
     Reform Debate)
         clusters by state and day?
                                                Tunable parameters
Tweets in a Spatio-Temporal Cluster

   Spatio-temporal bias dictate granularity of
   processing tweets
   Mumbai Terror Attack
     Cluster1: Tweets from India, 08/1/08
     Cluster2: Tweets from Pakistan, 08/1/08
     Cluster n: Tweets from USA, 08/13/08
Architecture
                        Step1 : Gathering event-
                            relevant tweets
                        Step2: Spatial, Temporal
                           metadata of tweets
                        Step3: Spatio-temporal
ta Collection,   analysis andclusters
                                visualizing        in
                       Step4: Thematic Descriptors
                        in spatio-temporal cluster
y Relevant Data
Thematic Descriptors

An event descriptor is an n-gram
  1,2 and 3 grams
n-gram descriptors
“President Obama in trying to regain control of the

health-care debate will likely shift his pitch in September”


1-grams: President, Obama, in, trying, to, regain, ...
2-grams: “President Obama”, “Obama in”, “in
trying”, “trying to”...
3-grams: “President Obama in”, “Obama in trying”;
“in trying to”...
Thematic Descriptors
“President”   “President Obama”   “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important

Spatial Importance (local vs. global popularity)
Thematic Descriptors
“President”    “President Obama”      “President Obama in”

A descriptor is an n-gram weighted by:
Thematic Importance
    redundancy: statistically discriminatory in nature
    variability: contextually important

Spatial Importance (local vs. global popularity)
Temporal Importance (always popular vs. currently
trending)
Thematic Importance of an n-gram
 “President”    “President Obama”      “President Obama in”


  Exploiting Redundancy
      tfidf of n-gram (Lucene Index)
      amplify by fraction of nouns in the n-gram
      (Stanford Natural Language Parser)
      amplify by fraction of non-stop words (‘going to
      try’)
Thematic Importance of an n-gram
  Exploiting Variability
    Big three/Big 3; Ford, GM, Chrysler, General
    Motors..
    Contextually relevant words boost statistical
    importance                              #)$
                                                              *&'+,-('$

  Focus word (fw) : “big three”       #(1('2-$
                                      )/%/',$
                                                 !"#$%&'(($



  Associated words (awi) :                        ./'0$


  co-occurring in spatio-temporal set of tweets
Thematic Importance of an n-gram
            #)$
                               *&'+,-('$
                                           focus word (fw): Big Three
 #(1('2-$         !"#$%&'(($
 )/%/',$
                                              associated word (awi): Ford
                   ./'0$




            Thematic importance of focus word:



                    tfidf of fw                          tfidf of awi

                         association strength of fw and awi
focus word in the given spatio-temporal corpus. The goal is to
o measure strength of associations is to useassociated words
        of the focus word only with the strongly word co-occu
  nguage [9]. Borrowing fromassociations is in thisword co-occure
        to measure strength of past success to use area, we mea
  rengthlanguage [9]. Borrowingwordpast success in this area, words a
          between the focus from and the associated we meas
                 Contextual Relevance
        strength between the focus word and the associated words as
he notion of point-wise mutual information in terms of co-o
        the notion of point-wise mutual information in terms of co-oc
We measure assocstr scores as aas a function ofthe point-wisem
        We measure assocstr scores      function of the point-wise
 etweenbetween the word Strengthcontextandawi .i . This is done
         the focus focus word and the context of awi This is done
             Association     and the of fw of aw
 ssociation strengths are determined in in the contexts thatthe d
        association strengths are determined the contexts that the
        Let us depends on contexts Cawi ={caw1 ,caw ..} where caw
 et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk
                contexts for awi aw as
        strong descriptors collocate with awawiassoc str(f w,aw) )isis
  rong descriptors that     that collocate with . . assoc (f w,awi       c
                                                 i         str       i
                      Contexts of associated P (pmi(f w,caw ))
                                              word awi : ‘Ford’
                                 assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi
                                                 i )=
                                                      k
                                                                k ))
                                                        |Cawi |
           !"#$%&'(($   assocstr (f w,awi )=        k                         ,∀cawk ∈Caw
                                                        |Cawi |
        where the point-wise mutual information between f w and ca
  here the i)*'+$is calculated as:
        aw ),point-wise mutual information between f w and c
                                   Pointwise Mutual Information
 wi ), is calculated big
    chrysler, GM, as:        3                          p(f w,caw )
                                                               k                   p(cawk |f w)
                                  pmi(f w,cawk )=log p(f w)p(caw          )
                                                                            =log     p(cawk )
                                                                      k

   focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f)
        where p(f w)= pmi(f k |f w)=
                     n(f w)
                            ;p(caw
                                          p(f w,cawk
                                     n(cawk ,f w)
                                                                       w)
                                                  ; n(f w) =log frequency
                                                                 p(caw
                         N                     n(f w)             k                        k
ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo
b) Top 15 extracted descriptors in the US for Mumbai attack even
ocus word and all associations in Cf w . The thematic weights of
 long with Temporal Importance of a1 to compu
             their strengths are plugged into Eqn
                          Descriptor
hematic score ngrami (th), of the n-gram descriptor.
B. Temporal Importance of an event descriptor: While th
 re good indicators of what will always dominate
         Certain descriptors is important in a spatio-tempora
 escriptors tend to dominate discussions. In order to allow
         discussions
 ossibly interesting descriptors to surface, we discount the th
            “Terrorism” in Mumbai Terror Attack Tweets
 escriptor depending on how popular it has been in the recent p
 iscount score for a n-gram, a Care reform debatedepending on
            “Healthcare” in Health tuneable factor
 vent, is calculated over a period of time as:
         Allow recent (possibly interesting) ones to
         surface     ngram (te)=temporal  ∗
                                            PD ngrami (th)d
                                i             bias   d=1       d

                        0-1 bias: less to more importance
 here   ngrami (th)d   is the enhanced thematic score
                               to recent n-grams            of the descri
ration for which we wish to apply the dampening factor, for exa
nt week. However, this temporal discount might not be relevant f
 ons. For this reason, we also apply a temporalbias weight ranging fr
weight closer to 1 Importance of while a weight closer to 0
     Spatial activity.
                   gives more importance, a Descriptor
 portance to past

  ial Importance of an event descriptor: We also discount the im
  a descriptor based on its occurence in other spatio-temporal sets
   is that Local descriptors are more interesting compared ar
           descriptors that occur all over the world on a given day
 sting compared to those that occur only in the spatio-temporal set
           to global ones
We define the spatial discount score for an n-gram as a fraction of sp
              Spatial discount
 artitions (e.g. countries) that had activity surrounding this descri

                                   k
            ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias )

             fraction of spatio-temporal           closer to 0 = global
            clusters n-gram occurred in                importance
of importance to the global presence of the descripto
ng on the event of interest, both these discounting fa
 rent spatio-temporal sets. For example, when processi
          STT Score of an n-gram
  Mumbai attack setting the spatialbias to 1 eliminate
 ial signals. While processing tweets from the US, on
 obal bias given that the event did not originate the
are setSpatio-temporal-thematic score of aof observations
        before we begin the processing descriptor
 he spatial thematic score - spatio-temporal discountsfrom
          = and temporal effects are discounted
final spatio-temporal-thematic (STT) weight of the n

           wi =ngrami (th)−ngrami (te)−ngrami (sp)


 illustrates the effect of our enhanced STT weights
ptors pertaining to the Mumbai terror attack event,
higher-order n-
grams picked over
  lower-order n-
 grams (if same
     scores)
Top X Descriptor Tag Cloud

 Tag size proportional to enhanced STT score

More Related Content

Similar to Twitris

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
1crore projects
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysis
Timea Turdean
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
Ashutosh Jadhav
 
Hao lyu slides_sarcasm
Hao lyu slides_sarcasmHao lyu slides_sarcasm
Hao lyu slides_sarcasm
Hao Lyu
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Krist Wongsuphasawat
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
Kim Holmberg
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
andrea huang
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportPatrick Grant
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Amparo Elizabeth Cano Basave
 
Closing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureClosing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureLouis Rosenfeld
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
Hossein Fani
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
Deep Kayal
 
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Artificial Intelligence Institute at UofSC
 
Data Visualization at Twitter
Data Visualization at TwitterData Visualization at Twitter
Data Visualization at Twitter
Krist Wongsuphasawat
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
Serge Beckers
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
Serge Beckers
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
Kan-Han (John) Lu
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
Gabriela Agustini
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...
Alfonso Crisci
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Artificial Intelligence Institute at UofSC
 

Similar to Twitris (20)

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysis
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
Hao lyu slides_sarcasm
Hao lyu slides_sarcasmHao lyu slides_sarcasm
Hao lyu slides_sarcasm
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
 
Closing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architectureClosing the Findability Gap: 8 better practices from information architecture
Closing the Findability Gap: 8 better practices from information architecture
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
 
Data Visualization at Twitter
Data Visualization at TwitterData Visualization at Twitter
Data Visualization at Twitter
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 

Recently uploaded

The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 

Recently uploaded (20)

The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 

Twitris

  • 1. Twitris Browsing real-time data by space, time and theme http://twitris.knoesis.org
  • 3. Motivation, Goals Mumbai Terror Attack 2008 Citizen sensor observations (flickr, twitter, blogs..) No matter where you looked, tapping into a cultural perception was impossible We wanted to know what people in India were saying vs. those in Pakistan or the U.S.A
  • 4. Spatio-Temporal-Thematic Slices of Real-time Data Around NEWS-WORTHY EVENTS Using space and time as cues for extracting social perceptions (behind signals) Summarizing hundreds and thousands of real-time observations
  • 5. The Health Care Reform Debate in the U.S
  • 6. The Health Care Reform Debate in the U.S Temporal navigation
  • 7. The Health Care Reform Debate in the U.S Temporal navigation Spatial Markers
  • 8. Zooming in on Florida
  • 10. Zooming in on Washington
  • 12. Find resources related to Find resources related to social perceptions social perceptions Browsing Real-time Data in Context News and News and Wikipedia articles Wikipedia articles toto put extracted put extracted SOYLENT GREEN and the HEALTH CARE REFORM descriptors in descriptors in context context News and Wikipedia articles to put extracted descriptors in context ✓Exploit spatio, temporal semantics for thematic aggregation Exploit spatio, temporal semantics for thematic aggregation
  • 13. Core of Twitris n-gram summaries - Spatio-temporal-thematic event descriptors
  • 14. Architecture Step1 : Gathering event- relevant tweets Because tweets are not pre-categorized Skip if I run out of time ..
  • 16. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran ..
  • 17. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  • 18. Topical Tweets Gathering event-specific tweets: Iran Election 1: Pick trending hashtags from Twitter - #iranelection; #iran .. 2: Google insights to expand hashtag list
  • 19. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query
  • 20. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets
  • 21. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts
  • 22. Topical Tweets 3. Issue a Twitter Search (API) every 30 seconds for every hashtag, keyword 1500 tweets per query 4. Obtain other Hashtags in crawled tweets Check for topic drifts 5. Repeat from Step 3 and babysit!
  • 23. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal ata Collection, analysis metadata of tweets and visualizing in ly Relevant Data ning citizen observations from Twitte
  • 24. Geo-Coordinates of Tweets Location a tweet originates from Location it mentions Approximation: Poster location on Twitter profile Location: Dayton, OH (Google geocoder service, GeoDB) Location: “best place in the world” (fail!)
  • 25. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets ta Collection, analysis and visualizing in Step3: Spatio-temporal clusters y Relevant Data
  • 26. Spatio-Temporal Clusters of Tweets Because every event is different.. and we want to preserve social perceptions that generated this data! Long-running, world-wide events (Iran Election Protest) clusters by country and week? Short, world-wide events (Olympics) clusters by country and day? Long-running, evolving, local events (Health Care Reform Debate) clusters by state and day? Tunable parameters
  • 27. Tweets in a Spatio-Temporal Cluster Spatio-temporal bias dictate granularity of processing tweets Mumbai Terror Attack Cluster1: Tweets from India, 08/1/08 Cluster2: Tweets from Pakistan, 08/1/08 Cluster n: Tweets from USA, 08/13/08
  • 28. Architecture Step1 : Gathering event- relevant tweets Step2: Spatial, Temporal metadata of tweets Step3: Spatio-temporal ta Collection, analysis andclusters visualizing in Step4: Thematic Descriptors in spatio-temporal cluster y Relevant Data
  • 29. Thematic Descriptors An event descriptor is an n-gram 1,2 and 3 grams
  • 30. n-gram descriptors “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” 1-grams: President, Obama, in, trying, to, regain, ... 2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... 3-grams: “President Obama in”, “Obama in trying”; “in trying to”...
  • 31. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by:
  • 32. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important
  • 33. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity)
  • 34. Thematic Descriptors “President” “President Obama” “President Obama in” A descriptor is an n-gram weighted by: Thematic Importance redundancy: statistically discriminatory in nature variability: contextually important Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)
  • 35. Thematic Importance of an n-gram “President” “President Obama” “President Obama in” Exploiting Redundancy tfidf of n-gram (Lucene Index) amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser) amplify by fraction of non-stop words (‘going to try’)
  • 36. Thematic Importance of an n-gram Exploiting Variability Big three/Big 3; Ford, GM, Chrysler, General Motors.. Contextually relevant words boost statistical importance #)$ *&'+,-('$ Focus word (fw) : “big three” #(1('2-$ )/%/',$ !"#$%&'(($ Associated words (awi) : ./'0$ co-occurring in spatio-temporal set of tweets
  • 37. Thematic Importance of an n-gram #)$ *&'+,-('$ focus word (fw): Big Three #(1('2-$ !"#$%&'(($ )/%/',$ associated word (awi): Ford ./'0$ Thematic importance of focus word: tfidf of fw tfidf of awi association strength of fw and awi
  • 38. focus word in the given spatio-temporal corpus. The goal is to o measure strength of associations is to useassociated words of the focus word only with the strongly word co-occu nguage [9]. Borrowing fromassociations is in thisword co-occure to measure strength of past success to use area, we mea rengthlanguage [9]. Borrowingwordpast success in this area, words a between the focus from and the associated we meas Contextual Relevance strength between the focus word and the associated words as he notion of point-wise mutual information in terms of co-o the notion of point-wise mutual information in terms of co-oc We measure assocstr scores as aas a function ofthe point-wisem We measure assocstr scores function of the point-wise etweenbetween the word Strengthcontextandawi .i . This is done the focus focus word and the context of awi This is done Association and the of fw of aw ssociation strengths are determined in in the contexts thatthe d association strengths are determined the contexts that the Let us depends on contexts Cawi ={caw1 ,caw ..} where caw et us call thecall the contexts foras iCawi ={caw1 ,caw22 ..},, wherecawk contexts for awi aw as strong descriptors collocate with awawiassoc str(f w,aw) )isis rong descriptors that that collocate with . . assoc (f w,awi c i str i Contexts of associated P (pmi(f w,caw )) word awi : ‘Ford’ assocstr (f w,awP (pmi(f w,caw k ,∀cawk ∈Cawi i )= k k )) |Cawi | !"#$%&'(($ assocstr (f w,awi )= k ,∀cawk ∈Caw |Cawi | where the point-wise mutual information between f w and ca here the i)*'+$is calculated as: aw ),point-wise mutual information between f w and c Pointwise Mutual Information wi ), is calculated big chrysler, GM, as: 3 p(f w,caw ) k p(cawk |f w) pmi(f w,cawk )=log p(f w)p(caw ) =log p(cawk ) k focus, model, release.. w,cawk )=log p(f w)p(caw ) ) is thep(cawk |f) where p(f w)= pmi(f k |f w)= n(f w) ;p(caw p(f w,cawk n(cawk ,f w) w) ; n(f w) =log frequency p(caw N n(f w) k k
  • 39. ig. 2: (a) Extracted descriptors sorted by TFIDF vs. spatio-tempo b) Top 15 extracted descriptors in the US for Mumbai attack even ocus word and all associations in Cf w . The thematic weights of long with Temporal Importance of a1 to compu their strengths are plugged into Eqn Descriptor hematic score ngrami (th), of the n-gram descriptor. B. Temporal Importance of an event descriptor: While th re good indicators of what will always dominate Certain descriptors is important in a spatio-tempora escriptors tend to dominate discussions. In order to allow discussions ossibly interesting descriptors to surface, we discount the th “Terrorism” in Mumbai Terror Attack Tweets escriptor depending on how popular it has been in the recent p iscount score for a n-gram, a Care reform debatedepending on “Healthcare” in Health tuneable factor vent, is calculated over a period of time as: Allow recent (possibly interesting) ones to surface ngram (te)=temporal ∗ PD ngrami (th)d i bias d=1 d 0-1 bias: less to more importance here ngrami (th)d is the enhanced thematic score to recent n-grams of the descri
  • 40. ration for which we wish to apply the dampening factor, for exa nt week. However, this temporal discount might not be relevant f ons. For this reason, we also apply a temporalbias weight ranging fr weight closer to 1 Importance of while a weight closer to 0 Spatial activity. gives more importance, a Descriptor portance to past ial Importance of an event descriptor: We also discount the im a descriptor based on its occurence in other spatio-temporal sets is that Local descriptors are more interesting compared ar descriptors that occur all over the world on a given day sting compared to those that occur only in the spatio-temporal set to global ones We define the spatial discount score for an n-gram as a fraction of sp Spatial discount artitions (e.g. countries) that had activity surrounding this descri k ngrami (sp)= |spatio−temporalsets| ∗(1−spatialbias ) fraction of spatio-temporal closer to 0 = global clusters n-gram occurred in importance
  • 41. of importance to the global presence of the descripto ng on the event of interest, both these discounting fa rent spatio-temporal sets. For example, when processi STT Score of an n-gram Mumbai attack setting the spatialbias to 1 eliminate ial signals. While processing tweets from the US, on obal bias given that the event did not originate the are setSpatio-temporal-thematic score of aof observations before we begin the processing descriptor he spatial thematic score - spatio-temporal discountsfrom = and temporal effects are discounted final spatio-temporal-thematic (STT) weight of the n wi =ngrami (th)−ngrami (te)−ngrami (sp) illustrates the effect of our enhanced STT weights ptors pertaining to the Mumbai terror attack event,
  • 42. higher-order n- grams picked over lower-order n- grams (if same scores)
  • 43. Top X Descriptor Tag Cloud Tag size proportional to enhanced STT score