Citizen Sensor Data Mining, Social Media Analytics and Applications

1,048 views

Published on

Opening talk at Singapore Symposium on Sentiment Analysis (S3A), February 6, 2015, Singapore. http://s3a.sentic.net/#s3a2015

Abstract

With the rapid rise in the popularity of social media, and near ubiquitous mobile access, the sharing of observations and opinions has become common-place. This has given us an unprecedented access to the pulse of a populace and the ability to perform analytics on social data to support a variety of socially intelligent applications -- be it for brand tracking and management, crisis coordination, organizing revolutions or promoting social development in underdeveloped and developing countries.

I will review: 1) understanding and analysis of informal text, esp. microblogs (e.g., issues of cultural entity extraction and role of semantic/background knowledge enhanced techniques), and 2) how we built Twitris, a comprehensive social media analytics (social intelligence) platform.

I will describe the analysis capabilities along three dimensions: spatio-temporal-thematic, people-content-network, and sentiment-emption-intent. I will couple technical insights with identification of computational techniques and real-world examples using live demos of Twitris (http://twitris2.knoesis.org).

Published in: Social Media
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,048
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Got carried away with coverage and content – too much material for 3 hours – so the remaining content can be used as background
  • Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social.
  • Interest level:
    (Based on Description info, lists and fav. tweets)
  • Semantic metadata, relationships: Inferred?
  • Structure Level Metadata
    Community Size
    - Showing scale: global vs. local
    Community growth rate
    - Popularity estimation for a topic
    Largest Strongly Connected Component size
    - Measuring Reachability in the directed graph
    No. of Weakly Connected Components & Max. size
    - distribution of pre-existing network connections (follower-followee)
    - Showing Nature: loose vs. compact
    Average Degree of Separation
    - How many hops between two authors
    Clustering Coefficient
    - Showing the likelihood of association

    Relationship Level Metadata
    Type of Relationship - topic/content (based on Retweet, Entity etc.)
    - follower/followee (based on structure)
    Relationship strength
    - Strong vs. Weak ties based on activity/ communication between users 
    - % tie strength
    User Homophily [Homophily (i.e., "love of the same") is the tendency of individuals to associate and bond with similar others]
    based on certain characteristic (e.g., Location, interest etc.)
    % of users showing similar behavior

    Reciprocity: mutual relationship
    - % of users following back their followers
    Active Community/ Ties
    - How active is the communication between users or how active are the relationship ties 
    - Average of tie strength based on activity
  • Building on foundations of 
    Statistical Natural Language Processing
    Information Extraction
    Semantic Web/ Knowledge Representation
    We will talk about key issues in extracting metadata from Informal Text and how it varies from what has been done in more well-structured text like news articles etc.
  • What the two tasks look like in terms of outputs they produce
  • This is an application of the NER work
  • We have come a long way but still room for improvement
  • Social media serves as a platform for people to speak their mind more freely, which lead to a growing volume of opinionated data that can be used by:
     
    (1) individuals for suggestion and recommendation
    (2) companies and organizations for marketing strategies and other decision making process
    (3) government for monitoring social phenomenons, being aware of potential dangerous situations, etc.
  • Fact can be proven, opinion cannot.
     
    An opinion is normally a subjective statement that bases on people's thoughts, feelings and understandings.
  • One of the most attractive advantages of unsupervised approaches is that they do not require for training data.
    Many sentiment analysis applications for social media content use simple lexicon-based method. However, for the problem of target-specific sentiment analysis, it doesn't work.
    Based on simple lexicon-based method which use a general sentiment lexicon containing positive/negative/neutral words in the general sense, 
    (1) for the task of "find tweets containing positive opinions about a specific topic", such as a movie, the results will like the table shows. However, 2,3,5,6,7 don't contain opinions about the movie.  (2) for the task of extract the opinion clues/expressions, the right answers should be like we show in the other picture. However, the simple  lexicon-based method might give all the words with orange color in the table.
  • We use background knowledge to help identifying the entity mentioned in the text, e.g., the knowledge from IMDB and Freebase is used to determine whether a noun phrase in the text is the name of a movie or a person. The lexical resources such as Urban Dictionary are used to help identifying the sentiment clues in the text. Urban Dictionary is a popular online slang dictionary with word definitions written by users. Each word is associated with a list of related words to interpret it, and many glossary definitions given by different users. Both the related words list and glossary definitions can be used to help determining the sentiment of the spotted word. E.g., the word “wicked” has a list of related words, and most of those words carry positive sentiment, so that we can infer that “wicked” is highly possible a positive sentiment clue. In addition, there is also a definition of “wicked” given by user saying that it has different meanings in different countries. Given this knowledge, if we know the location of the author who wrote the tweet, we can infer whether “wicked” in the tweet  is used as a sentiment clue, and whether it is positive, negative or neutral.
  • While sentiment analysis concerns about people’s opinions about something, emotion analysis focuses on our own emotional state, our mental health!
    Am I happy? Sad? Angry? Etc.
  • As an emotional create, emotion plays an important role in all aspects of our lives!

    (1) Influences our decision-making
    (2) Affects our social relationships
    (3) Shapes our daily behavior

    What is more important, emotions affect our mental health:
    Take new mothers and veterans for example
  • It is difficult to annotate sentences with emotion labels for following reasons:

    Emotion is more fine-grained (joy, sadness, anger, etc.), while sentiment usually deals with only positive, neutral and negative labels.
    A reader may incorrectly interpret the emotion embedded in a sentence by a writer
  • We leverage more than 100 emotion-related hashtags to filter Twitter streaming data and use ending emotion hashtags to infer the emotion label of a tweet, e.g.,
    “leaving for hospital #nervous” -> sadness emotion

    (1) We kept only the tweets with the emotion hashtags at the end
    (2) We discarded tweets which have less than five words, since they may not provide sufficient context to infer emotions
    (3) We removed the tweets which contain URLs or quotations. A large amount of tweets with URLs are information-oriented, which do not convey emotions.
  • This figure shows the benefits of leveraging Twitter ‘big data’:

    When the size of training data is 1,000, the classification accurary is about 45%;
    When we increase the size of training data to 10,000, the classification accurary gets close to 55%;
    When we further increase the size of training data to about 2M, the classification accurary reaches about 65%.
  • As an emotional create, emotion plays an important role in all aspects of our lives!

    (1) Influences our decision-making
    (2) Affects our social relationships
    (3) Shapes our daily behavior

    What is more important, emotions affect our mental health:
    Take new mothers and veterans for example
  • As an emotional create, emotion plays an important role in all aspects of our lives!

    (1) Influences our decision-making
    (2) Affects our social relationships
    (3) Shapes our daily behavior

    What is more important, emotions affect our mental health:
    Take new mothers and veterans for example
  • As an emotional create, emotion plays an important role in all aspects of our lives!

    (1) Influences our decision-making
    (2) Affects our social relationships
    (3) Shapes our daily behavior

    What is more important, emotions affect our mental health:
    Take new mothers and veterans for example
  • As an emotional create, emotion plays an important role in all aspects of our lives!

    (1) Influences our decision-making
    (2) Affects our social relationships
    (3) Shapes our daily behavior

    What is more important, emotions affect our mental health:
    Take new mothers and veterans for example
  • User engagement levels: applications in coordination activities
    Connecting the dots here with NGO initiatives (*presented by Selvam)
  • Categorization of severity based on weather conditions. Actionable information is contextually dependent.
  • Supervised Machine Learning based system to enable support for high level operations of coordination,
    by mining demand-supplies of resources/services, and matching them.
  • 1.) Extract information nuggets for donations, requests and offers and the context (geo, time), etc..

    2.) Semi-structured knowledge-based is then used for Matching of demand-supply to assist coordination
  • Example of PCN analysis in action–
    Clustering mined influencers (from network), by the user demographics (People) and ability to tune engagement by understanding ‘why’ of the influencers (Content)
  • Connections/Relationships
    - Implicit content features
  • Authoritative nature of the poster or the volume of follower connections did not predict the re-tweet behavior associated with the tweets!

    ‘Call of action’ type of content creates sparse retweet networks while giving less weight to the attribution of users – because ‘action’ is important than attribution in that context.


  • Interaction networks can work as proxy for identify influencers in the evolving communities (by using network algorithms like PageRank),
    because traditional network analysis of community structures can not work due to sparse user connections data, e.g., follower-followee networks.

  • Slide #1: Introduce the project, participants and the main goal
    Slide #2: Substantive slide showing either key graphic/chart or claim from this work
    Slide #3 (optional): Provide additional context or teaser for what would be discovered on poster
  • Increasing diverging groups write more of general reporting type content based on past incidents, while ones with decreasing diverging behavior write more social & future action related content


    Least diverging group members practice RT heavily, while the most divergent groups, hashtags


    Group discussion divergence increases during the event, but decreases in the post phase

  • Explain about continuous semantics
  • (It is real-time widget for monitoring of needs, so will not be active after the event has passed)

    http://twitris.knoesis.org/oklahomatornado

  • And http://knoesis.org/vision
  • Citizen Sensor Data Mining, Social Media Analytics and Applications

    1. 1. 1
    2. 2. Citizen sensor data mining, social media analytics and applications Singapore Symposium on Sentiment Analysis (S3A) ,Feb 6, 2015 Amit Sheth Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing @ Wright State University
    3. 3. Acknowledgements Significant components of this talk is from the tutorial I gave at WWW2011: “Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications,” with Meena Nagarajan and Selvam Velmurugan. Contributors to Twitris and/or Semantic Social Web Research @ Kno.e.sis: L. Chen, H. Purohit, W. Wang with: P. Anantharam, A. Jadhav, P. Kapanipathi, Dr. T.K. Prasad, And alumni: K. Gomadam, M. Nagarajan, A. Ranabahu) Funding: NSF, AFRL, NIH; Collaborations: IBM, Microsoft 3
    4. 4. Ohio Center of Excellence in Knowledge- enabled Computing • Among top 10 among all universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search) • Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications • Exceptional student success: internships and jobs at top salary (IBM Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups ) • 80+researchers including 15 World Class faculty (>3K citations/faculty) and 45+ PhD students- practically all funded • $2M+/yr research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …) 4
    5. 5. 5 Social Media Landscape
    6. 6. 6Data for mid2012 http://www.mediabistro.com/alltwitter/social-media-stats-2014_b54243 Never before humanity is so connected
    7. 7. • Mumbai Terror Attack • Iran Election 2009 • Haiti Earthquake 2010 • Occupy Wall Street • Kashmir Floods 2014 Citizen Sensors in Action 7Image: http://huff.to/hp0OhA
    8. 8. • Ghonim, who has been a figurehead for the movement against the Egyptian government, told Blitzer “If you want to liberate a government, give them the internet.” • Egyptian anti-government demonstrator sleeps on the pavement under spray paint that reads 'Al- Jazeera' and 'Facebook' at Cairo's Tahrir square on February 7, 2011. http://www.cbsnews.com/stories/2011/02 /15/eveningnews/main20032118.shtml Revolution 2.0 Political/Social Activism 8 • When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.” http://cnn.com/video/?/video/world/2011/02/13/nr.social.media.revolution.cnn http://cnn.com/video/?/video/tech/2011/02/11/barnett.egypt.social.media.cnn
    9. 9. Citizen Journalism 9 Twitter Journalism Images: http://bit.ly/9GVfPQ, http://bit.ly/hmrTYV
    10. 10. • Social News • Social Media and Global Media are inter-twined. News is increasingly Social 10
    11. 11. 11 Some of the significant human, social & economic development applications we work on at Kno.e.sis • Coordination during disasters (Qatar Computing Research Institute, Microsoft Research NYC) • Harassment on social media (WSU cognitive scientists) • Prescription drug abuse, Cannabis & Synthetic Cannabinoid epidemiology (Center for Interventions, Treatment and Addictions Research, ….) • Depressive disorders (Mayo Clinic) • Gender-based violence (United Nations) Highly multidisciplinary team efforts, often with significant partners, with real world data, intended to achieve real- world impact
    12. 12. 12 Sample of Real-World Impact & Media Coverage • Twitter Data Mining Reveals America‘s Religious Fault Lines, MIT Technology Review, Oct 6, 2014 • Digital soldiers emerge heroes in Kashmir flood rescue, HindustanTimes, September 25, 2014 • India's social media election battle, BBC News, Mar 30, 2014 • #Cursing Study: 10 Lessons About How We Use Swear Words on Twitter, Time.com, Feb 19, 2014 • Twitris: Taking Crisis Mapping to the Next Level, Tech President, June 24, 2013 • Picking the President: Twindex, Twitris Track Social Media Electorate, Semanticweb.com, Aug 3, 2012 • Web App Analyzes Tweets in Real Time for a Record of Historic Events, Mashable.com, Feb 17, 2012
    13. 13. 13 TWITRIS’ Technical Approach to Understand & Analyze Social Content Social Data is incredibly rich
    14. 14. 14 Some of the topics on Online Social Media we research at Kno.e.sis 1. Named Entity Recognition 2. Language usage in Social Media 4. Exploration of People, Content and Network dynamics 6. Sentiment, Emotion and Opinion mining 5. Trust 6. Integrated exploitation of Sensor (physical), Web (Cyber) and Social data for PCS applications 7. TWITRIS: A System for Mining Collective Intelligence from Citizen-Sensor Data
    15. 15. • "Who says what, to whom, why, to what extent and with what effect?" [Laswell] • Network: Social structure emerges from the aggregate of relationships (ties) • People: poster identities, the active effort of accomplishing interaction • Content : studying the content of communication Social Information Processing 15
    16. 16. Why People-Content-Network + Spatial-Temporal-Thematic metadata? (Example of Understanding Crisis Data) 16 , Offer help, etc.
    17. 17. ` • Explicit information from user profiles – User Names, Pictures, Videos, Links, Demographic Information, Group memberships... • Implicit information from user attention metadata – Page views, Facebook 'Likes', Comments; Twitter 'Follows', Retweets, Replies.. People Metadata: Variety of Self-expression Modes on Multiple Social Media Platforms 17
    18. 18. People Metadata: Various Types Identification Structural Network Activity Interests 18
    19. 19. People Metadata: Continued User Identification Metadata • User-id • Screen/Display-name of user • Real name of user • Location • Profile Creation Date • User description - Biodata of the user - Link to webpage of the user Interest Metadata • Author type - Trustee/donor, journalist, blogger, scientist etc. • Favorite tweets • Types of lists subscribed • Style of Writing (personality indicator) • No. of Followees • Majority of author type of Followees 19
    20. 20. People Metadata: Continued Web Presence: - User affiliations - Influence Metric – e.g., KLOUT (www.klout.com) Activity Metadata • Age of the profile • Frequency of posts • Timestamp of last status • No. of Posts • No. of Lists/groups created • No. of Lists/groups subscribed Influence Metadata (Inferring People Metadata from Network level Information) • No. of Followers – normal, influential • No. of Mentions • No. of Retweets/Forwards • No. of Replies • No. of Lists/groups following • No. of people following back • Authority & Hub Scores 20
    21. 21. Content Metadata: Content Dependent (Tweet) 23 Direct Content-based Metadata Indirect content-based metadata (External metadata)
    22. 22. Direct Content-based Metadata Content Metadata: Content Dependent (SMS) 24
    23. 23. Connections/Relationships matter! (foundation for the network) Network Metadata 25 Structure Metadata • Community Size • Community growth rate • Largest Strongly Connected Component size • Weakly Connected Components & Max(WCC) size • Average Degree of Separation • Clustering Coefficient Relationship Metadata • Type of Relationship • Relationship strength • User Homophily (based on certain characteristic such as location, interest etc.) • Reciprocity: mutual relationship • Active Community/ Ties
    24. 24. Metadata Creation & Extraction Length: 109 characters General topic: Egypt protest This poor {sentiment_expression: {target: “Lara Logan”, polarity: “negative”}} woman! RT @THR CBS News‘ {entity:{type=“News Agency”}} Lara Logan {entity:{type=“Person”}} Released From Hospital {entity:{type=“Hospital”}} After Egypt {entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0 {external_URL} 26
    25. 25. Metadata Extraction from Informal Text Meena Nagarajan, ‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
    26. 26. Content Analysis: Typical Sub-tasks • Recognize key entities mentioned in content – Information Extraction (entity recognition, anaphora resolution, entity classification..) – Discovery of Semantic Associations between entities • Topic Classification, Aboutness of content – What is the content about? • Intention Analysis – Why did they share this content? 28 • Sentiment Analysis – What opinions are people conveying via the content? • Author Profiling – What can we infer about the author from the content he posts? • Context (external to content) extraction – URL extraction, analyzing external content
    27. 27. • Named Entity Recognition – I loved <movie> the hangover </movie>! • Key Phrase Extraction 29 NER, Key Phrase Extraction
    28. 28. Named Entity Recognition “I loved your music Yesterday!” Yesterday is an album “It was THE HANGOVER of the year..lasted forever.. The Hangover is not a movie So I went to the movies..badchoice picking “GI Jane”worse now” GI Jane is a movie 30 Task of NER : Identifying and classifying tokens
    29. 29. Analysing the Content can be Hard… Using a domain model (E.g., MusicBrainz) Using context cues from the content • e.g. new Merry Christmas tune Reduce potential entity spot size (with restrictions) • e.g. new albums/songs Multimodal Social Intelligence in a Real-Time Dashboard System Analyzing the content can be hard 31
    30. 30. 32 Music NER application : BBC SoundIndex (IBM Almaden) Pulse of the Online Music Populace Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’ special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
    31. 31. The Vision http://www.almaden.ibm.com/cs/projects/iis/sound/ 33
    32. 32. 34
    33. 33. Several Insights 35 Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source? Ignoring Spam can change ordering of popular artists Trending popularity of artists Trending topics in artist pages
    34. 34. Predictive Power of Data • Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. • User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. • Challenging traditional polling methods! 36
    35. 35. KEY PHRASE EXTRACTION 37
    36. 36. Key Phrase Extraction - Example • Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day 38
    37. 37. 39 M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009: 539-553 TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently Foreign relations surfaces up
    38. 38. INTENTION MINING 40
    39. 39. Why do people share? • Outside of the psychological incentives, broadly, people share to Seek Information OR Share Information • If we understand the intent behind a post, we can build systems that respond to it better • An application: Understand intent to deliver targeted content – Use case: Online Content-Targeted Advertisements on Social Media Platforms 41
    40. 40. Circa 2009 -Content-based Ads 42
    41. 41. Today – Content-based Ads on Profiles 43
    42. 42. What is going on here.. • Ads are targeted on profile interests, demographic data • But Interests on profiles do not translate to purchase intents – Interests are often outdated.. – Intents are rarely stated on a profile.. • Some profile data does seem to work – Example: New store openings, sales targeted at location information in a profile 44
    43. 43. But Monetizable Intents are Elsewhere, away from their profiles.. 45
    44. 44. Showing clear intents on MySpace posts but no relevant ads.. 46
    45. 45. –Non-trivial –Non-policed content •Brand image, Unfavorable sentiments –People are there to network •User attention to ads is not guaranteed –Informal, casual nature of content •People are sharing experiences and events –Main message overloaded with off topic content I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008 Targeted Content-based Advertizing 47
    46. 46. Focus: Discuss Methodology, Preliminary Results in… • Identifying intents behind user posts on social networks – Identify Content with monetization potential • Identifying keywords for advertizing in user-generated content – Considering interpersonal communication & off-topic chatter 48 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
    47. 47. Result - 8X more interest for non-profile ads.. • Using profile ads – Total of 56 ad impressions – 7% of ads generated interest • Using authored posts – Total of 56 ad impressions – 43% of ads generated interest • Using topical keywords from authored posts – Total of 59 ad impressions – 59% of ads generated interest 49
    48. 48. SENTIMENT / OPINION MINING 50
    49. 49. Sentiment Analysis: Motivation Which movie should I see? What customers complain about? Why do people oppose health care reform? Image: http://bit.ly/eZtKBF 51
    50. 50. Content Analysis: Sentiment Analysis/Opinion Mining • Two main types of information we can learn from user- generated content: fact vs. opinion • Much of social media text (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions. • Extracting structured sentiment information from unstructured content • Allowing computation to be done on “what people think” and “how people feel” 52
    51. 51. • From coarse-grained to fine-grained – Document level -> sentence level -> expression level – General sentiment -> domain-dependent sentiment -> target- dependent sentiment • From static to dynamic – Our attitude can be changed during social communication. • Modeling, detecting, and tracking the change of attitude • What leads to the change of attitude? E.g., persuasion campaign 53 Sentiment Analysis: Challenges
    52. 52. Sentiment Analysis: Target-specific Opinion Identification Observations: • The opinion clues may not be toward the given target (1,2,3,6) • The opinion clues are domain and context dependent (5,7) • Single words are not enough (4,7,8) Simple lexicon-based method doesn't work well. 54 Target of “sexy” is “Helena” Target of “terrific” is “reviews” “free” is not opinionated in movie domain. Target of “loving” is “telling” “well” in “as well” is not opinionated
    53. 53. 55 Extracting a diverse and richer set of sentiment-bearing expressions, including formal and slang words/phrases Assessing the target-dependent polarity of each sentiment expression A novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus Extracting Diverse Sentiment Expressions With Target-dependent Polarity from Twitter [Chen et al. ICWSM 2012]
    54. 54. The Usage of Background Knowledge 56
    55. 55. 57 Sentiment Analysis: Feature and Aspect Extraction Motivation • To understand a user’s opinions about a product at a fine-grained level, support opinion summarization for products, and automatically extract pros and cons from reviews it is essential to identify product features and aspects. Impact • Existing methods tend to require seed terms and focus on identifying explicit features or a few high-level aspects. • Our approach is capable of identifying both explicit and implicit aspects and does not require any labeling efforts. Approach • We use a combination of corpus-based association measures, and semantic similarity measures to identify product aspects in an efficient clustering based approach.
    56. 56. 58 Clustering for Aspect Discovery in Opinion Mining [Chen et al. in submission]
    57. 57. 59 It is actually about tracking public opinion. PollingorSocial Media Analysis? 1. Sample size 2. Representative of the target population 3. Accurate measure of opinions 4. Timeliness
    58. 58. • We Study different groups of social media users who engage in the discussions of 2012 U.S. Republican Presidential Primaries, and compare the predictive power among these user groups. • Existing studies on predicting election result are under the assumption that all the users should be treated equally. • How could different groups of users be different in predicting election results? 60 Harnessing the Power of Social Data to Predict Election Results [Chen et al., SocInfo 2012]
    59. 59. 61 1. Engagement Degree 2. Tweet Mode 3. Content Type 4. Political Preference User Categorization
    60. 60. Predicting a User's Vote • Basic idea: for which candidate the user shows the most support – Frequent mentions – Positive sentiment 62 Nm(c): the number of tweets mentioning the candidate c Npos(c): the number of positive tweets about candidate c Nneg(c): the number of negative tweets about candidate c  (0 <  < 1): smoothing parameter  (0 <  < 1): discounting the score when the user does not express any opinion towards c. The user posted opinion about c The user mentioned c but did not post opinion about c More mentions, higher score More positive/less negative opinions, higher score
    61. 61. 63 Revealing the challenge of identifying the vote intent of “silent majority” Retweets may not necessarily reflect users' attitude. Prediction of user’s vote based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets The right-leaning user group provides the most accurate prediction result. It correctly predict the winners in 8 out of 10 states with an average prediction error of 0.1. To some extent, it demonstrates the importance of identifying likely voters in electoral prediction. Twitter users are not “equal” in predicting elections!
    62. 62. EMOTION MINING 64
    63. 63. Emotion Mining: Motivation 65 • Emotion is essential to all aspects of our lives. – Influences our decision-making – Affects our social relationships – Shapes our daily behavior • Emotional mental health – New mothers may suffer from post-partum depression – Veterans may constantly suffer from negative emotions because of post-traumatic stress disorder
    64. 64. Emotion Mining: what have we studied 66 • Can we automatically create a large emotion dataset with high quality labels from Twitter? How? • What features can effectively improve the performance of supervised machine learning algorithms? • Can the system developed on Twitter data be directly applied to identify emotions from other datasets? • What can we learn about emotion from social media data?
    65. 65. • Collect self-annotated emotion tweets [Wang et. al. SocialCom 2012] – Seven emotions: joy, sadness, anger, love, fear, surprise, thankfulness “When I see a cop, no matter where I am or what I’m doing, I always feel like every law I’ve ever broken is stamped all over my body #fear” “I hate when my mom compares me to my friends. #anger” “I hate when I get the hiccups in class. #embarrassing” Harnessing twitter" big data" for automatic emotion identification [Wang et al. SocialCom12] 67
    66. 66. 0.4 0.45 0.5 0.55 0.6 0.65 1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184 accuracy number of tweets in training data LIBLINEAR MNB The more data, the merrier 68 Results of performing seven emotion classifications
    67. 67. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 69 • Automatically classify suicide notes to different (15) categories at sentence level • Emotion categories – Positive • Hopefulness, thankfulness, forgiveness, love, pride, happiness – negative • Sorrow, abuse, anger, hopelessness, guilt, blame, fear • Other categories – Information, instructions
    68. 68. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 70 Sentence: “Found out today that // I passed my math STAAR test.” • N-gram features • Unigram, e.g., found, today, passed, etc. • Bigram, e.g., found_out, out_today, etc. • N-gram position – Unigram: found-1, out-1, today-1,…,, I-2, passed-2, my-2, … • Knowledge-based features: – LIWC (Pennebaker et al., 2014a) – WordNet-Affect (Strapparava and Valitutti, 2004) – MPQA (Wilson et al., 2005) • Syntactic features: – Part-of-speech tags, e.g., Found/VBN out/RP today/NN that/IN I/PRP passed/VBD… – Dependency relations, e.g., root(ROOT-0, Found-1); ccomp(Found-1, passed-6); dobj(passed-6, test-10) …
    69. 69. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 71 Winner: N-gram(1,2), knowledge-based and syntactic features
    70. 70. Cursing in English on Twitter [Wang et al. CSCW14] 72 • The main reason that people use curse words is to express some strong emotions, especially anger and frustration. [Jay 1992, 2000; McEnergy 2006; Nasution and Rosa 2012]
    71. 71. Normalized Emotion Distributions over Time in Eastern Standard TimeNormalized Emotion Distributions over Days (EST) “I am so thankful for my family && close friends. They hold me together when everything else around me is falling apart. #SoBlessed #Thankful” 73
    72. 72. Normalized Emotion Distributions over Time (EST) “I thank God everytime I see another day :*) #thankful .” 74
    73. 73. Rank Mom Dad 1 Irritation (7, 562) Irritation (3, 034) 2 Sadness (2, 315) Sadness (1, 363) 3 Affection (2, 225) Embarrassment (1, 158) 4 Zest (2, 213) Zest (1, 035) 5 Embarrassment (1, 849) Affection (1, 030) 6 Thankfulness (1, 537) Cheerfulness (911) 7 Cheerfulness (1, 332) envy (902) “I hate when my dad uses my laptop. Its mine. Not yours. You have your own computer. I have shit to do, get off now please. #annoyed” “ugh my mom gets so nervous when i drive #annoying” “My mom just told me I can't open any presents early cause I'm too old for that #depressing” What are the top Emotions Associated with Moms and Dads? 75
    74. 74. PEOPLE ANALYSIS - Deriving People Metadata - from Content Analysis - from Network Analysis - Merge of two approaches - People-Content-Network Analysis to leverage the metadata - Finding Influential Users - Finding User Types & Affiliation - Measuring Social Engagement - Leverage communities to assist coordination 76
    75. 75. People Analysis: Social Engagement & Coordination 77 Imagine a crisis scenario such as Haiti earthquake (2010) or hurricane Sandy (2012) - emergency teams are looking for ways to help the victims • What are the best possible ways to communicate: identify and engage people • Between resource providers (supply) and people in need of resources (demand) • Topical community influencers • How response teams can coordinate social media communities well between volunteers, managers in organizational structure, and resource seekers?
    76. 76. People Analysis: Who is asking for help, Who is offering to help? Smart Data in the context of Disaster Management ACTIONABLE: Timely delivery of right resources and information to the right people at right location! 78 Because everyone wants to Help, but DON’T KNOW HOW!
    77. 77. Really sparse Signal to Noise: • 2M tweets during the first 48 hrs. of #Oklahoma-tornado-2013 - 1.3% as the precise resource donation requests to help - 0.02% as the precise resource donation offers to help 79 • Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity (OFFER) • I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) Disaster Response Coordination: Finding Actionable Nuggets for Responders to act • Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST) • Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST) For responders, most important information to manage coordination dependencies is the scarcity and availability of resources Blog by our colleague Patrick Meier on this analysis: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/
    78. 78. People Analysis: Match demander- suppliers for coordination during crisis Purohit, H., Castillo, C., Diaz, F., Sheth, A., & Meier, P. (2013). Emergency-relief coordination on social media: Automatically matching resource requests and offers. First Monday, 19(1). 80
    79. 79. Demand-Supply identification and representation: core & facets • Extract Core of the phrase- “what” – Other facets includes “who”, “where”, “when”, etc. • Supervised Learning to classify items for demands, supplies, and resource type facets 81 Rotary collecting clothing and other donations in New Jersey <URL> { source: “Twitter”, author: “@NN”, text: “Rotary collecting clothing and other donations in New Jersey <URL>”, donation-info: { donation-type: “Request”, donation-type-confidence: 0.8, donation-organization: “Rotary”, donation-item: “clothing and other donations”, donation-location: “New Jersey” }, … } Corresponding data item in the semi-structured knowledge inventory: • IR model approach to match demand (request) with supply (offer) items in this semantically annotated knowledge inventory
    80. 80. Leveraging Communities for Whom to Engage With, Why and How 82 Purohit et al., User Taglines: Alternative Presentations of Expertise and Interest in Social Media . ASE Social Informatics, 2012
    81. 81. Network Analysis Interesting questions to ask: • How communities form around topics- growth & evolution • What are the effects of influential participants in the communities • What are the effects of content nature (or sentiment, opinions) flowing in network on the community structures and growth • What is the community structure: degree of separation and sub- communities that contribute for macro-level effects, e.g., coordination, engagement “To Discover How A, is in Touch with B and C, Is Affected by the Relation Between B & C” -John Barnes 83 Foundation of network: •Nodes •Connections/Relationships Image: http://www.onasurveys.com/
    82. 82. Graphs showing sparse (A) and dense (B) RT networks and their corresponding follower graphs for 'call for action' and 'information sharing' tweet content types M. Nagarajan, H. Purohit, and A. Sheth, ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010 84
    83. 83. Understanding Evolving Community Structures for Coordination 85 User interaction networks of two topical communities– Occupy LA and Chicago, of emerging influencers during Occupy Wall Street (OWS) event 2011 Application of evolving communities: H. Purohit, J. Ajmera, S. Joshi, A. Verma, A. Sheth. Finding Influential Authors in Brand-Page Communities. 6th Int'l AAAI Conference on Weblogs and Social Media (ICWSM), Dublin, Ireland, June 5-7, 2012
    84. 84. Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates Romney Obama Before 1st debate After 1st debate After Hurricane Sandy After 3rd debate Understanding Community Evolution for Real-World Actions 86 Social Media analysis for US elections 2012, powered by Twitris: http://analysis.knoesis.org/uselection/insights/
    85. 85. On Understanding the Divergence of Online Social Group Discussion • Change of group discussion divergence over time, and different phases of real world events • Relation between discussion divergence and existing theories of social cohesion and social identity in Psychology • Prediction of future change in the group discussion divergence Research Questions on Social Dynamics in Communities Acknowledgement: NSF SoCS grant for ‘Leveraging Social Media during Emergency Response’ Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
    86. 86. • Prior work: – Focus on structural metrics to understand group evolution dynamics, but may not be sufficient to answer ‘WHY a group diverges over time’ • Our approach: – Content driven measure: collective divergence of group members for topics of discussion – Features assessing role of socio-psychological theories: cohesion & identity • Data: – Tweets during evolving events of natural disasters, and social activism Contrasting Prior Work and Approach Evolution of groups in online social communities surrounding events  On Understanding the Divergence of Online Social Group Discussion Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media. 88
    87. 87. • During #sandy, predicted low diverging (focused) groups to engage with on the updates of flights, first delays & cancellation, then resuming • Natural disaster (D) events (Hurricane Irene and Sandy) have stronger correlations with identity-driven features than with cohesion featuresWe predicted group discussion divergence across phases, by 0.83 AUC Time On Understanding the Divergence of Online Social Group Discussion Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media. 89
    88. 88. Continuous Semantics for Evolving Events to Extract Smart Data 90
    89. 89. Dynamic Model Creation Continuous Semantics 91
    90. 90. Live Demo of Powerful Social Media Analysis: Twitris 92
    91. 91. Twitris - Motivation 1. Information Overload • Multiple events around us • WHAT to be aware of • Multiple Storylines about same event!! 93 Image: http://bit.ly/etFezl
    92. 92. Twitris - Motivation 2. Evolution of Citizen Observation • with location and time 94
    93. 93. Twitris - Motivation 3. Semantics of Social perceptions • What is being said about an event (theme) • Where (spatial) • When (temporal ) Twitris lets you browse citizen reports using social perceptions as the fulcrum 95
    94. 94. Twitris: Semantic Social Web Mash-up Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media 96 96
    95. 95. Twitris: Architecture 97 Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, ‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553, Oct 5-7, 2009.
    96. 96. Twitris: Functional Overview 98
    97. 97. Twitris: Event Summarization 99
    98. 98. Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC Legends for Different needs #OKC 100 Clicking on a tag brings contextual information– relevant tweets, news/blogs, and Wikipedia articles Twitris: Real-time information
    99. 99. How People from Different parts of the world talked about US Election Images and Videos Related to US Election 101 Twitris: Analysis by location for contrast in social perceptions
    100. 100. Twitris: Sentiment Analysis • Sentiment Analysis – using statistical and machine learning techniques 102
    101. 101. 103 How was Obama doing in the first debate? Twitris: Sentiment Analysis- Smart Answers with reasoning!
    102. 102. The Dead People mentioned in the event OWC 104 Twitris: Impact of Background Knowledge
    103. 103. Twitris: Demo, Quick Show http://twitris2.knoesis.org/ • Many other interesting efforts – Eg: Vivek K. Singh, Mingyan Gao, and Ramesh Jain. 2010. From microblogs to social images: event analytics for situation assessment. In Proceedings of the international conference on Multimedia information retrieval (MIR '10). ACM, New York, NY, USA, 433-436. 105
    104. 104. • Do you have a sense of immense opportunity of analyzing citizen sensing for useful social signals? • Do you appreciate the broad range of issues and challenges? Did we present examples and a few insights into how to address some unique challenges? • Did spatio-temporal-thematic, people-content-network, emotion-sentiment-intent dimensions present reasonable way to organize vast number of relevant research challenges and techniques? 106 Conclusions
    105. 105. 107 http://knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA thank you, and please visit us at

    ×