Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Perceptions
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Perceptions

on

  • 2,719 views

Invited talk at Session on Semantic Knowledge for Commodity Computing, at Microsoft Research Faculty Summit 2011, July 19-20, 2011, Redmond, WA. ...

Invited talk at Session on Semantic Knowledge for Commodity Computing, at Microsoft Research Faculty Summit 2011, July 19-20, 2011, Redmond, WA. http://research.microsoft.com/en-us/events/fs2011/default.aspx

Statistics

Views

Total Views
2,719
Views on SlideShare
2,716
Embed Views
3

Actions

Likes
4
Downloads
85
Comments
1

1 Embed 3

http://twitter.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Kno.e.sis has 15 faculty in Computer Science, life sciences and health care, cognitive science and business. It has about 50 PhD students and post docs– about 2/3 of these in Computer Science. Its faculty members have 40 labs, and occupies a majority of 50K sqft Joshi Research Center. Its students are highly successful– eg tenure track faculty @ Case Western Reserve Univ or Researcher at IBM Almaden. It has received recent funding from funding from Microsoft Research. IBM Research, HP Labs, Google, and small companies (Janya, EZdi,…) and collaborates with many more (Yahoo! Labs, NLM, …).
  • Taalee (subsequently Voquette and Semagix) was founded in 1999 as an Audio/Video Web Search Company (focus on A/V mainly for scalability and market focus reasons, servicename: MediaAnywhere). Domain models/ontologies were created in major areas (many more than what you can find on Bing in 2011) and automatically populated to build knowledge bases (populated ontologies or WorldModel) from a variety of structured and semistructured sources, and periodically kept up to date. This was than used for semantic annotation/metadata extraction to drive semantic search, browsing, etc applications over data crawled from Web sites.
  • Mediaanywhere’s search snapshot.
  • The important thing is that the system knew that Robert Duval is a movie actor, is a different person that David Duval who is a golfer and a sportsperson, and had understanding of a variety of relationships Robert Duval participates in – such as
  • Many types of data: Web 2.0/microblogging/social networking with informal text, Blogs/News/Documents with structured text, structured and semistructured data, multimedia data, sensor data.Many types of domain models: formal ontologies, folksonomies, nomenclatures or core vocabularies, community created/managed dictionaries, …Extraction of semantic metadata (labels referenced with domain models) make it possible to build many applications.
  • There is an event oriented community being formed during an eventDynamic domain model would give enhanced understanding of what is going on in the event, and hence, the communityCommunity has people, sub-community of users in the community will form an implicit network for ‘resource sharing’ and ‘resource provider’ How do we get to know such people? Semantic analysis of user profile description, with help of external knowledge bases to tell us, ‘what is a user interest I’ will tell us, ‘who is what’ – Blogger, news journalist, trustee, Red Cross Member etc. and also, ‘who’ is interested in ‘what’This information of mined user interests and types, from users profiles, can be leveraged to perform semantic association with dynamic domain model of the event
  • Architecture of twarqlStream tweetsCovert it into RDFUse SPARQL Query to filter it
  • Again SPARQL query is a subset of domain modelModifying this will help us to be updated with the informationTherefore we use Doozer
  • We use background knowledge to help identifying the entity mentioned in the text, e.g., the knowledge from IMDB and Freebase is used to determine whether a noun phrase in the text is the name of a movie or a person. The lexical resources such as Urban Dictionary are used to help identifying the sentiment clues in the text. Urban Dictionary is a popular online slang dictionary with word definitions written by users. Each word is associated with a list of related words to interpret it, and many glossary definitions given by different users. Both the related words list and glossary definitions can be used to help determining the sentiment of the spotted word. E.g., the word “wicked” has a list of related words, and most of those words carry positive sentiment, so that we can infer that “wicked” is highly possible a positive sentiment clue. In addition, there is also a definition of “wicked” given by user saying that it has different meanings in different countries. Given this knowledge, if we know the location of the author who wrote the tweet, we can infer whether “wicked” in the tweet  is used as a sentiment clue, and whether it is positive, negative or neutral.
  • Consider the number of observation records generated by a commercial flight. A single flight from New York to Los Angeles on a twin-engine Boeing 737 generates around 240 terabytes of data; and, on any given day, there are around 28,537 such commercial flights in the United States (Higginbotham, 2010). So, for only a single day, the amount of sensor data generated can reach the petabyte scale. But how much of this sensor data is actually useful for decision-making? The pilot may need to understand the general condition of the plane and the external environment during flight; the ground crew may need to know the speed, location and trajectory of the aircraft; and the mechanic may need to know of any anomalous behavior detected during the flight. Much of this knowledge must be derived from the low-level observational data, but only the high-level inferences may be required for actual decision-making.
  • The architecture illustrates an abstraction (or explanation) of traffic data through a stratified interpretation of machine sensor observations, information on the Web, and real-time social data. This process involves:+ an integration of various domain (i.e., traffic) and domain-independent (i.e., time, space, sensor) ontologies to annotate and interpret the data+ observations from traffic speed sensors on the road to detect traffic jams (slow moving traffic),+ determine the cause of the traffic jam through a stratified explanation process: - first check if normal/everyday events could explain (i.e., rush-hour) - second check if planned events could explain (i.e., planned events such as road construction, music concerts, and sporting events) by accessing background knowledge on the Web (such as US Department of Transportation, BuckeyeTraffic, or Eventful) -third check if unplanned events could explain (i.e., traffic accident / wreck) by accessing social media (such as twitter)
  • The architecture illustrates an abstraction (or explanation) of traffic data through a stratified interpretation of machine sensor observations, information on the Web, and real-time social data. This process involves:+ an integration of various domain (i.e., traffic) and domain-independent (i.e., time, space, sensor) ontologies to annotate and interpret the data+ observations from traffic speed sensors on the road to detect traffic jams (slow moving traffic),+ determine the cause of the traffic jam through a stratified explanation process: - first check if normal/everyday events could explain (i.e., rush-hour) - second check if planned events could explain (i.e., planned events such as road construction, music concerts, and sporting events) by accessing background knowledge on the Web (such as US Department of Transportation, BuckeyeTraffic, or Eventful) - third check if unplanned events could explain (i.e., traffic accident / wreck) by accessing social media (such as twitter)

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Perceptions Presentation Transcript

  • 1.
  • 2. Citizen Sensing
    Opportunities and Challenges in Mining Social Signals and Perceptions
    Amit P. Shethamit@knoesis.org
    LexisNexis Ohio Eminent Scholar
    Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis)
    Wright State University, Dayton, OHhttp://knoesis.org
    Thanks: Kno.e.sis team, esp. Wenbo Wang, Chen Lu, Cory, Hemant, Pavan
  • 3. Ohio Center of Excellence in Knowledge-enabled Computing
    Semantics as core enabler, enhancer @ Kno.e.sis
    one of the two largest academic groups in Semantic Web; multidisciplinary
  • 4. Semantics & Semantic Web in 1999-2002
  • 5. ATTRIBUTE & KEYWORDQUERYING
    SEMANTIC BROWSING
    uniform view of worldwide distributed assets of similar type
    Targeted e-shopping/e-commerce
    assets access
    Taalee Semantic/Faceted Search & Browsing (1999-2001)
    BLENDED BROWSING & QUERYING
    Taalee Semantic Search ….
  • 6. Links to news on companies that compete against Commerce One
    Links to news on companies Commerce One competes against
    (To view news on Ariba, click on the link for Ariba)
    Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically
    Search for company ‘Commerce One’
    Semantic Search/Browsing/Directory (2001-….)
  • 7. Semantic Search/Browsing/Directory (2001-….)
    System recognizes ENTITY & CATEGORY
    Relevant portionof the Directory is automatically
    presented.
  • 8. Semantic Search/Browsing/Directory (2001-….)
    Users can explore
    Semantically related
    Information.
  • 9. Extracting Semantic Metadata from Semistructured and Structured Sources (1999 – 2002)
    Semagix Freedom for building
    ontology-driven information system
    Managing Semantic Content on the Web
  • 10. Fast forward to 2010-2011
  • 11. Search
    Integration
    Analysis
    Discovery
    Question
    Answering
    Situational
    Awareness
    Domain Models
    Patterns / Inference / Reasoning
    RDB
    Relationship Web
    Metadata Extraction
    Meta data / Semantic Annotations
    Text
    (formal/
    Informal)
    Structured and Semi-structured data
    Sensor Data
    Multimedia Content and Web data
  • 12. Let Us Start with Social Data
  • 13. Jan. 2011
    Egypt Protest
    Image:http://bit.ly/g4yPXS
    Image:http://bit.ly/qHI7wI
    Image:http://bit.ly/qmDocA
  • 14. http://bit.ly/nP1E4q
    Image:http://bit.ly/fl4gEJ
    Mar. 2011
    http://cnet.co/jdQgME
    Japan Earthquake
    and Tsunami
    http://bit.ly
    /gWboib
    Recently funded NSF proposal: Social Media Enhanced Organizational Sensemakingin Emergency Response
  • 15. Image:http://bit.ly/nqq6Wj
    Jul. 2011
    I-75 Traffic Jam in US
  • 16. Citizen Sensing
    Who?
    An interconnected network of people
    What?
    Observe, report, collect, analyze, and disseminate information
    How?
    Via text, audio, video and built in device sensor (and smart devices)
    Image: http://bit.ly/nvm2iP
    "Citizen Sensing, Social Signals, and Enriching Human Experience”,
    Internet Computing, July-Aug. 2009
  • 17. Enablers: Mobile Devices & Ubiquitous Connectivity
    Mobile Platforms Hit Critical Mass, Over 5 billions users
    1+B with internet connected mobile devices (2010)
    Smartphones > PCs + Notebooks > Notebooks + Netbooks (2010E)
    500K+ mobile phone applications
    74% of mobile phone users (2.4B)worldwide used SMS (2007)
    Mobile is Global; Ubiquitous; 24x7
    Built in sensors
    environmental, biometric/biomedical,...
    Image: http://bit.ly/mYqcPF
  • 18. Enablers: Web 2.0 & Social Media
    A huge number of users
    750M+ active Facebook Users
    1+B tweets/wk; 175M+ Twitter users
    Internet Users:2Bln
    Image: http://bit.ly/euLETT
    Image: http://bit.ly/euLETT
    Image: http://bit.ly/euLETT
    Image: http://bit.ly/euLETT
    Image: http://bit.ly/euLETT
  • 19. Role of Semantics in Citizen Sensing
    Key of citizen sensing: extract metadata/annotate
    different types of metadata (depend on application need)
    Spatial, temporal, thematic: key phrase, named entity, relationship, topic/category, event descriptors, sentiment…
    People, network, content
    Semantics: provide the meaning of data
    various forms of semantic models: core vocabularies/nomenclatures,community created dictionaries/folksonomies/reference databases, automatically extracted domain models, manually created taxonomies, formal ontologies
    deal with complexities of user generated data; supplement well-known statistical and natural language processing (NLP) techniques
  • 20. Research Application: Twitris
  • 21. Twitris - Motivation
    Image: http://bit.ly/etFezl
    What were people in U.S.A. saying about
    Bin Laden’s death?
    How about people in Egypt?
    How about people in India?
    TOO MANY tweets to be read each day!!!
    Twitris
    Now: WHEN, WHERE, People are talking WHAT
    Future: socio, cultural, behavioral studies
    ‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data – Challenges and Experiences’, 2009
  • 22. Twitris: Semantic Social Web Mash-up
    Select topic
    Select date
    N-gram summaries
    Topic tree
    Spatial Marker
    Tweet traffic
    Sentiment Analysis
    Images & Videos
    Reference news
    Related tweets
    Wikipedia articles
    More: TWITRIS
  • 23. Analyzing Events from Temporal Perspective
    How did tweets in United States on the death of Bin Laden evolve over time?
    May
    2nd
    May
    4th
    “RT @TWlTTERWHALE: Please do not click on any links saying Osama Bin Laden EXECUTION Video! This is a virus that hacks accounts. ”
    RT @ReallyVirtual: Here's a picture of OBL'shideout in Abbottabad, as shared by a friend @Rahat http://yfrog.com/h7w4izmj
    Img:http://www.twitpic.com/4t1mt0
  • 24. Analyzing Events from Spatial Perspective
    Tweets (Death of Bin Laden) in Egypt VS tweets in India
    May
    2nd
    Egypt
    May
    2nd
    India
    “RT @mvatlarge: U.S. has given Pakistani military nearly $20 billion since 9/11 for the privilege of housing bin Laden: http://is.gd/xegnFm”
    “#Egypt foreign minister: Egygov't has no official comment but we condemn all forms of violence in international relations. #osama #obl”
  • 25. A sample of current research @ Kno.e.sis demonstrating role of semantics in Citizen Sensing & Social Media Analysis
  • 26. User-communityEngagement Analysis
  • 27. User-community Engagement
    How do we understand the phenomenon of user participation (engagement) in topic discussions?
    How communities form during the product launch?
    What factors can attract users to engage in these communities, therefore further spreading the message?
    How quickly we can disseminate information between resource providers and people in need of resources in case of emergency?
    Image: http://itcilo.wordpress.com
  • 28. Analysis Framework:People-Content-network Analysis (PCNA)
    28
    Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter. SoME @ WWW2011.
  • 29. Three Sets of Features
    Which set is more important?
    Content features[Characteristics of tweets posted by active friends of U]:
    keywords: number of event-relevant keywords
    hashtags: number of event-relevant hashtags
    retweet: number of retweets
    mention: number of mentions
    url: number of relevancy-adjust hyperlinks
    Irrelevant hyperlink is given number -1
    subjectivity: Subjectivity scores for words and emoticons
    Linguistic Cues (LIWC1 analysis): Features for the language usage. Top-3 transformed features using Principle Component Analysis (PCA) extracted
    Community features: [Characteristics of the active community/network under consideration]
    wccSize: size of the weakly-connected component (WCC) which U’s friends belongs to in the active network.
    wccPercent: ratio of wccSizeto the size of the active network.
    connectivity: number of active friends (i.e. followees) in the community.
    communitySize: size of the active community.
    Author features [Characteristics of friends that U is following]:
    Only friends in the active community are considered.
    logFollower: logarithm of follower count
    logFollowee: logarithm of followee count
    Klout[1]: a integrated measure of user influence and popularity
    Other profile information and activity history[2].
  • 30. Experiments: Results
    Content is the key to understand user community engagement
    High
    Low
    Performance
    Network
    Content
    All
    People
    Event-Type
    Summary of Prediction Accuracy (%)
    Statistical significant results are in bold
    30
  • 31. Background Knowledge to improve Social Data Analysis
    Event oriented Community
    External Knowledge bases
    Dynamic Domain Model for the event
    SEMANTIC ASSOCIATION TO UNDERSTAND ENGAGEMENT LEVEL &
    IMPROVE IE
    Social Network
    User Profiles
    Mined User Interests and User Types
  • 32. Analysing the Content can be Hard…
    Is Merry Christmas a song?
    If it is, which ‘Merry Christmas’since there are 60 songs of the same name.
    ‘So Good’ is also a song!
    Using a domain model (E.g., MusicBrainz)
    Using context cues from the content
    • e.g. new Merry Christmas tune
    Reduce potential entity spot size (with restrictions)
    • e.g. new albums/songs
    ‘Multimodal Social Intelligence in a Real-Time Dashboard System’ VLDB Journal 2010
  • 33. Real Time Social Media Data Analysis
  • 34. Motivation
    People can’t wait for Information
    Disaster Management
    Ushahidi (www.ushahidi.org)
    Real-Time Markets
    RealTimeMarkets (http://www.realtimemarkets.com/)
    Brand Tracking
    Twarql (http://wiki.knoesis.org/index.php/Twarql)
    Movie reviews
    Flicktweets (www.flicktweets.com)
    Journalism
  • 35. Scenarios
    Brand Tracking
    Give me a stream of locations whereKinectis being mentioned right now
    Give me all people that have said negative things aboutKinect
    How can we do this?
  • 36. Twarql (Twitter Feeds through SPARQL)
    Semantically annotate tweets with entities, hashtags, URLs, sentiments, etc.
    Encode content in a structured format (RDF) using shared vocabularies (FOAT, SIOC, MOAT, etc.)
    Structured querying of tweets
    Subscribe to a stream of tweets that match a given query
    Real-time delivery of streaming data.
    More: TWARQL
  • 37. Twarql Architecture
  • 38. Back to the Scenario
    Give me a stream of locations where Kinect is being mentioned now
    Give me all people that have said negative things about Kinect
  • 39. Dynamic Domain Models for Semantic Analysis of Real-Time DataakaContinuous Semantics
  • 40. Motivation
    Semantic processing using a model of the domain
    But it is difficult to model dynamic domains on social web
    spontaneous (arising suddenly)
    real-time data requiring continuous searching and analysis
    distributed participants with fragmented and opinionated information
    diverse viewpoints involving topical or contentious subjects
    feature context colored by local knowledge as well as perceptions based on different observations and their socio-cultural background.
    More: Continuous Semantics
  • 41. Dynamic Model Creation
    Heliopolis is a suburb of Cairo.
  • 42. Dynamic Evolving Models to underpin Semantics
    “Reports from Azadi Square - 4 people killed by police, people killed police who shot. More shots being fired #iranelections”
    “Both Ahmadinejad& Mousavideclare victory in Iranian Elections.”
    “situation in tehran University is so worrisome. police have attacked to girls dormitory #tehran #iranelection”
    Events
    June 13 2009
    June 15 2009
    June 12 2009
    Ahmadinejad & Mousavi are politicians in Iran
    Tehran University is a University in Iran
    Azadi Square is a city square in Tehran
    Key phrases
    Models
  • 43. Sentiment/Opinion Extraction
  • 44. Challenges
    Domain/Topic-dependency: spotting the target of the sentiment is as important as finding sentiment itself
    E.g., “longriver” (no sentiment), “long battery life” (positive), or “long time for downloading” (negative).
    Context-awareness : encoding the context information into the extracted sentiment
    E.g., “mustwatch a movie today” (no sentiment) and “this movie is a must see” (positive).
    Informal language (abbreviations, misspelling, slang...): using Urban Dictionary
  • 45. The Usage of Background Knowledge
  • 46. Real Time Feature Stream
  • 47. Real-Time Sensor, Social, Multi-media data
    2010’s
    Dynamic User Generated Content
    2000’s
    1990’s
    Static Document and files
    Web DATA evolved over time
  • 48. Semantic Abstraction
    Does -1,-2,-4,-4 make any sense to you?
    Overwhelming amount of raw sensor data doesn’t make much sense to decision makers
    Freezing temperature
    Rain
    What if the data is from sensors on a highway?
    Freezing temperature + rain => icy road
    Close the highway? OR
    Spread salt on the road to melt ice?
    So what?
    Semantic Sensor Web Demo
  • 49. A cross-country flight from New York to Los Angeles on a Boeing 737 plane generates a massive 240 terabytes of data
    - GigaOmni Media
    But a pilot or a ground engineer at the destination is interested in very small
    number of events and associated observational data that are relevant to their work.
    Higginbotham, S. (2010, September). Sensor Networks Top Social Networks for Big Data.
    Gigaom.com. http://gigaom.com/cloud/sensor-networks-top-social-networks-for-big-data-2/.
    49
  • 50. Background Knowledge
    Blizzard
    Rain Storm
    ABSTRACTION
    Huge amount of
    Raw Sensor Data
    Features representing Real-World events
    Abstraction
    More: Semantic Sensor Web
  • 51. Weather Alert Application
    Detection of events, such as blizzards, from weather station observations
  • 52. Evaluation
    • Data Used: Nevada Blizzard (April 1st – April 6th)
    Show me which places had blizzard in past 24 hrs
    70% Data clear
    30% Feature Observed
  • 53. Evaluation (cont.)
    Abstraction can:
    Make sense out of raw data
    Greatly reduce the size of data
    • Data Used: Nevada Blizzard (April 1st – April 6th)
    Amount of Raw Sensor Data
    Amount of Abstraction Data
    Semantic Scalability
  • 54. Traffic Application
    Reasons can be found via other types data: tweets, news papers, etc.
    Sensor data: 10 passing cars per minute
    What might be the reason?
  • 55. Integration and Abstraction of Traffic DataStratified explanation
    Knowledge Model Empowered Abstraction
    Knowledge Models
    Different types of Data
  • 56. Take Home Message
    Amount of citizen sensing (and machine sensing) data is huge, varied, and growing rapidly. Search and Sift won’t work.
  • 57. Take Home Message (Cont.)
    Semantics play a key role in refering "meaning" behind the data. Requires progress from keywords -> entities -> relationships -> events, from raw data to human-centric abstractions.
  • 58. Take Home Message (Cont.)
    Wide variety of semantic models and KBs(vocabularies, social dictionaries, community created semi-structured knowledge, domain-specific datasets, ontologies)empower semantic solutions. This can lead to Semantic Scalability – scalability that is meaningful to human activities and decision making.
  • 59. Interested in more?
    Kno.e.sis Wiki for the following and more:
    Computing for Human Experience
    Continuous Semantics to Analyze Real-Time Data
    Semantic Modeling for Cloud Computing
    Citizen Sensing, Social Signals, and Enriching Human Experience
    Semantics-Empowered Social Computing
    Semantic Sensor Web
    Traveling the Semantic Web through Space, Theme and Time
    Relationship Web: Blazing Semantic Trails between Web Resources
    SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups
    Semantically Annotating a Web Service
    Tutorial: Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011)
    Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social Media Content),and HP Researh(Knowledge Extraction from Community-Generated Content).