Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences

on

  • 1,616 views

Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, "Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences," Tenth ...

Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, "Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences," Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, Poland.

Statistics

Views

Total Views
1,616
Views on SlideShare
1,609
Embed Views
7

Actions

Likes
1
Downloads
17
Comments
0

1 Embed 7

http://www.slideshare.net 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Since the paper was submitted more has happened, techniques have improved, ui has changed, submission to the ISWC challengeSo what you see here in terms of examples and UI will be different, but the analytics are the same
  • unique appeal behind microblogging is its portability and theability to post and recieve eye-witness accounts in real-time.An important sideeffect has been the rise of citizen journalism, where humans as sensors are play-ing an active role in the process of collecting, reporting, analyzing and dissemi-nating news and information"[wiki]. Today, these six billion human sensors haveresulted in an interconnected network of participatory citizens who share andreact to observations in an almost real-time manner.
  • Perhaps, the most interesting phenomenon about such citizen generated datais that it acts as a lens into the social perception of an event in any region, at anypoint in time.Consider the g20 event that sparked protests in spatial region xand something else in y. Citizen observations relayed from the same or dierentlocation oer multiple, and often complementary viewpoints or storylines aboutan event. What is more, these viewpoints evolve over time and with the occurenceof other events, with perceptions gaining momentum in certain regions afterbeing popular in some others.
  • Consequently, in addition to what is being said about an event (the theme),where (spatial) and when (time) it is being said are integral components tothe analysis of such data (see Figure 1).
  • Presentation will focus on the mashup.Not going into details of the research contributions, formula.There are more details on the paper.
  • Obtain “raw” tweetsProcessStore the information extracted from the tweets
  • Thematic processign consider tweets in each set separatelyBy doing the spatio-temporal slicing of data, we preserve these social signals that might be dependent on place and time
  • We extract “Descriptors”. In a sense they are telling us what a region is…It’s like extracting terms with high TFIDF, but we have three attributes to consider.
  • Big 3 not used as frequently as GM, Ford and Chrysler (or vice versa) Presence of contextually relevant words should strengthen the score of a descriptorGets collocated descriptors, calculate PMI
  • This pic shows many countries together. The tool itself allows storyline views per country (spatial-temporal set)
  • This pic shows many days together. The tool itself allows storyline views per day (spatial-temporal set)
  • Click on keyword / descriptor prompts browsing of storylines

Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Experiences Presentation Transcript

  • 1. Spatio-Temporal-Thematic Analysis of Citizen Sensor DataChallenges and Experiences
    MeenakshiNagarajan, KarthikGomadam, Amit Sheth, AjithRanabahu, RaghavaMutharaju and AshutoshJadhav
    Kno.e.sis, Wright State University
    Presented at the conference by Pablo Mendes
    Text at: http://knoesis.org/library/resource.php?id=00559
  • 2.
    • Micro-blogging platforms – Twitter, Friendfeed..
    • 3. Revolutionizing how unaltered, real-time information is disseminated and consumed
    • 4. Significant portion of data is Experiential in nature
    • 5. First-hand observations, experiences, opinions via texts, images, audio, video (Citizens as sensors)
    2
  • 6. Citizen Sensor Observations
    Are a lens into the social perception of an event in any region at any point in time
    Mumbai Terror Attacks, Iran Elections, Obama’s Health Care Reform…
    They present complementary, sometimes lagged viewpoints that evolve over time and with other external stimuli
    3
  • 7. what is being said about an event (the theme), IS AS IMPORTANT AS
    where (spatial) and when (time) it is being said
    4
  • 8. Contribution, Presentation Focus
    A Web MashUp that
    Processes textual citizen sensor observations pertaining to real-world events
    Takes three dimensions of space, time and theme into consideration
    Extracts local and global social signals/ perceptions over time
    5
  • 9. TWITRIS – System overview
    Crawling, Processing, Visualization
    6
  • 10. Twitris - System Overview
    3.
    1.
    2.
    Obtain Topically Relevant Tweets, Extract Location, Time stamp Information, Store in DB
    2. Process Tweet Contents, Store extracted metadata in DB
    3. User Visualization talks to the DB
    7
  • 11. 1. Gathering topically relevant data, extracting Location, time stamps
    Obtaining Citizen-Sensor Observations
    8
  • 12. Crawling Tweets Relevant to Event
    • Twitter has no explicit topic categorization
    • 13. Community generated hashtags are the strongest cues
    • 14. Strategy
    • 15. Start with manually selected keywords (seed)
    • 16. Obtain additional hints from Google Insights
    • 17. Crawl using keywords, hashtags
    9
  • 18. Crawling Tweets Relevant to Event
    • Events change, Topics of discussions change
    • 19. Periodically update keywords used for crawl
    • 20. Process crawled tweets, extract top 1 TFIDF keyword, obtain Google Insights Suggestions
    • 21. Continue crawl
    10
  • 22. Challenges, Limitations of Crawl
    Volume and Rapidity of Change, Quality of data is key
    • Keyword gathering, Crawl requires supervision
    Twitter API restrictions
    • Hourly / Daily access limits
    • 23. Severe limitations on extracting past data
    • 24. Can go back only a few weeks!
    Large scale geo-code conversions
    11
  • 25. Tweet Spatio-Temporal Coordinates
    Time Stamp obtained at crawl
    Location extraction is non-trivial
    Twitter does not store location data
    Approximate tweet location using user’s location information
    Note : Recently, Twitter added per tweet geo locations
    Geocodes found for a user’s location in profile
    Use Google / Yahoo geo-coder services
    Not always provided, not always decoded to a location
    12
  • 26. Twitris - System Overview
    2.
    2. Process Tweet Contents, Store extracted metadata in DB
    13
    Extracted time stamps, geo-coordinates stored in the DB
  • 27. Spatio-Temporal Sets of Tweets
    Intuition behind processing of tweets
    Events have inherent spatial temporal biases associated with them
    Bias dictates granularity of data processing
    Mumbai terror attack: country level activity everyday
    Health care reform: US state level activity per week
    14
  • 28. Spatial Temporal Sets
    Group observations using spatial, temporal bias cues
    E.g., for Mumbai Terror Attack, create X sets of tweets per day, each cluster represents activity in one country
    Thematic processing over each set
    Ensures local, temporal social signals are preserved
    15
  • 29. Processing Tweets
    Extracting important event descriptors
    Key words or phrases (n-grams)
    What is a region paying attention to today?
    Objective: from volumes of tweets to key descriptors
    Using three attributes: Thematic, Temporal and Spatial importance of a descriptor
    16
  • 30. Descriptor: Thematic Importance
    TFIDF weighted 3-grams
    Amplified if nouns, no stop words
    Amplified by presence of contextual evidence
    GM
    GENERAL MOTORS
    BIG THREE
    CHRYSLER
    BIG 3
    FORD
    17
  • 31. Contextual Evidence
    Thematic score of a descriptor (focus word fw) is enhanced proportional to
    PMI based association strength with and thematic score of co-occurring descriptors (awi)
    Co-occurring descriptors obtained from the same spatio-temporal set as the descriptor
    18
  • 32. Certain descriptors always dominate observations
    Terrorism in the Mumbai Terror Attack
    Healthcare in the Health Care Reform discussions
    To allow less popular, interesting descriptors to surface, we discount thematic score proportional to recent popularity
    Descriptor: Thematic-Temporal Importance
    19
  • 33. Descriptors that occur all over the world not as interesting as those local to a region
    Discount thematic-temporal score proportional to number of spatial sets (not local) that mention the descriptor
    Descriptor: Thematic-Temporal-SpatialImportance
    20
  • 34. TFIDF vs. Spatio-Temporal-Thematic (STT) Scores of Descriptors
    Interesting descriptors surface up!
    Other examples in the paper
    21
  • 35. Examples of STT scored Descriptors over 5 days
    Mumbai Terror Attack
    22
  • 36. Discussions around Descriptors
    For some context : Extracting chatter surrounding a descriptor of interest
    Using a clustering approach
    Figure showing top X STT weighted descriptors
    What are people
    saying about a
    descriptor?
    (user click driven)
    23
  • 37. Clustering Algorithm Overview
    For a descriptor of interest
    We generate complementary viewpoints expressed in the data
    Using a Information Content based Clustering Algorithm
    Basic Intuition
    Among descriptor associations, select complementary viewpoint hints
    Initialize clusters with descriptor of interest, viewpoint indicator
    Expand cluster to add strong associations
    24
  • 38. Algorithm Overview – Example for focus word ‘Pakistan’
    25
  • 39. Discussions around Descriptors - Example
    Around Pakistan on a particular day
    US (shades of blue), India (orange), Pakistan (red)
    Size indicates STT score
    This summarized viz. only for presentation
    26
  • 40. Discussions around Descriptors - Example
    Around G20 in Denmark across 4 days (color)
    Size indicates STT scores
    27
  • 41. User Interface and Visualizations
    Browsing the when, what and where slices of social perceptions behind events
    28
  • 42. Data Processed – AJ FIX
    • As of March 2009
    Breakdown of crawled Tweets
    Percentage of Tweets by region
    for Financial Crisis event
    29
  • 43. Events in Twitris
    WISE 2009
    Mumbai Terror Attack, G20
    ISWC Challenge 2009
    Health Care Reform, Iran Election
    New features
    Integration with news, Wikipedia, Tweets mentioning descriptors
    Current Explorations, Investigations :
    Automated Crawl, Progressive TFIDF
    Streaming data analysis
    30
  • 44. For more information
    meena@knoesis.org
    karthik@knoesis.org
    amit@knoesis.org
    ajith@knoesis.org
    Try it on-line: http://twitris.knoesis.org
    http://knoesis.org/research/semweb/projects/socialmedia/
    31
  • 45. Development Team
    32