Citizen Sensor Data Mining, Social Media Analytics and Applications

1,084 views

Published on

Opening talk at Singapore Symposium on Sentiment Analysis (S3A), February 6, 2015, Singapore. http://s3a.sentic.net/#s3a2015

Abstract

With the rapid rise in the popularity of social media, and near ubiquitous mobile access, the sharing of observations and opinions has become common-place. This has given us an unprecedented access to the pulse of a populace and the ability to perform analytics on social data to support a variety of socially intelligent applications -- be it for brand tracking and management, crisis coordination, organizing revolutions or promoting social development in underdeveloped and developing countries.

I will review: 1) understanding and analysis of informal text, esp. microblogs (e.g., issues of cultural entity extraction and role of semantic/background knowledge enhanced techniques), and 2) how we built Twitris, a comprehensive social media analytics (social intelligence) platform.

I will describe the analysis capabilities along three dimensions: spatio-temporal-thematic, people-content-network, and sentiment-emption-intent. I will couple technical insights with identification of computational techniques and real-world examples using live demos of Twitris (http://twitris2.knoesis.org).

Published in: Social Media
  • Be the first to comment

Citizen Sensor Data Mining, Social Media Analytics and Applications

  1. 1. 1
  2. 2. Citizen sensor data mining, social media analytics and applications Singapore Symposium on Sentiment Analysis (S3A) ,Feb 6, 2015 Amit Sheth Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing @ Wright State University
  3. 3. Acknowledgements Significant components of this talk is from the tutorial I gave at WWW2011: “Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications,” with Meena Nagarajan and Selvam Velmurugan. Contributors to Twitris and/or Semantic Social Web Research @ Kno.e.sis: L. Chen, H. Purohit, W. Wang with: P. Anantharam, A. Jadhav, P. Kapanipathi, Dr. T.K. Prasad, And alumni: K. Gomadam, M. Nagarajan, A. Ranabahu) Funding: NSF, AFRL, NIH; Collaborations: IBM, Microsoft 3
  4. 4. Ohio Center of Excellence in Knowledge- enabled Computing • Among top 10 among all universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search) • Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications • Exceptional student success: internships and jobs at top salary (IBM Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups ) • 80+researchers including 15 World Class faculty (>3K citations/faculty) and 45+ PhD students- practically all funded • $2M+/yr research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …) 4
  5. 5. 5 Social Media Landscape
  6. 6. 6Data for mid2012 http://www.mediabistro.com/alltwitter/social-media-stats-2014_b54243 Never before humanity is so connected
  7. 7. • Mumbai Terror Attack • Iran Election 2009 • Haiti Earthquake 2010 • Occupy Wall Street • Kashmir Floods 2014 Citizen Sensors in Action 7Image: http://huff.to/hp0OhA
  8. 8. • Ghonim, who has been a figurehead for the movement against the Egyptian government, told Blitzer “If you want to liberate a government, give them the internet.” • Egyptian anti-government demonstrator sleeps on the pavement under spray paint that reads 'Al- Jazeera' and 'Facebook' at Cairo's Tahrir square on February 7, 2011. http://www.cbsnews.com/stories/2011/02 /15/eveningnews/main20032118.shtml Revolution 2.0 Political/Social Activism 8 • When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.” http://cnn.com/video/?/video/world/2011/02/13/nr.social.media.revolution.cnn http://cnn.com/video/?/video/tech/2011/02/11/barnett.egypt.social.media.cnn
  9. 9. Citizen Journalism 9 Twitter Journalism Images: http://bit.ly/9GVfPQ, http://bit.ly/hmrTYV
  10. 10. • Social News • Social Media and Global Media are inter-twined. News is increasingly Social 10
  11. 11. 11 Some of the significant human, social & economic development applications we work on at Kno.e.sis • Coordination during disasters (Qatar Computing Research Institute, Microsoft Research NYC) • Harassment on social media (WSU cognitive scientists) • Prescription drug abuse, Cannabis & Synthetic Cannabinoid epidemiology (Center for Interventions, Treatment and Addictions Research, ….) • Depressive disorders (Mayo Clinic) • Gender-based violence (United Nations) Highly multidisciplinary team efforts, often with significant partners, with real world data, intended to achieve real- world impact
  12. 12. 12 Sample of Real-World Impact & Media Coverage • Twitter Data Mining Reveals America‘s Religious Fault Lines, MIT Technology Review, Oct 6, 2014 • Digital soldiers emerge heroes in Kashmir flood rescue, HindustanTimes, September 25, 2014 • India's social media election battle, BBC News, Mar 30, 2014 • #Cursing Study: 10 Lessons About How We Use Swear Words on Twitter, Time.com, Feb 19, 2014 • Twitris: Taking Crisis Mapping to the Next Level, Tech President, June 24, 2013 • Picking the President: Twindex, Twitris Track Social Media Electorate, Semanticweb.com, Aug 3, 2012 • Web App Analyzes Tweets in Real Time for a Record of Historic Events, Mashable.com, Feb 17, 2012
  13. 13. 13 TWITRIS’ Technical Approach to Understand & Analyze Social Content Social Data is incredibly rich
  14. 14. 14 Some of the topics on Online Social Media we research at Kno.e.sis 1. Named Entity Recognition 2. Language usage in Social Media 4. Exploration of People, Content and Network dynamics 6. Sentiment, Emotion and Opinion mining 5. Trust 6. Integrated exploitation of Sensor (physical), Web (Cyber) and Social data for PCS applications 7. TWITRIS: A System for Mining Collective Intelligence from Citizen-Sensor Data
  15. 15. • "Who says what, to whom, why, to what extent and with what effect?" [Laswell] • Network: Social structure emerges from the aggregate of relationships (ties) • People: poster identities, the active effort of accomplishing interaction • Content : studying the content of communication Social Information Processing 15
  16. 16. Why People-Content-Network + Spatial-Temporal-Thematic metadata? (Example of Understanding Crisis Data) 16 , Offer help, etc.
  17. 17. ` • Explicit information from user profiles – User Names, Pictures, Videos, Links, Demographic Information, Group memberships... • Implicit information from user attention metadata – Page views, Facebook 'Likes', Comments; Twitter 'Follows', Retweets, Replies.. People Metadata: Variety of Self-expression Modes on Multiple Social Media Platforms 17
  18. 18. People Metadata: Various Types Identification Structural Network Activity Interests 18
  19. 19. People Metadata: Continued User Identification Metadata • User-id • Screen/Display-name of user • Real name of user • Location • Profile Creation Date • User description - Biodata of the user - Link to webpage of the user Interest Metadata • Author type - Trustee/donor, journalist, blogger, scientist etc. • Favorite tweets • Types of lists subscribed • Style of Writing (personality indicator) • No. of Followees • Majority of author type of Followees 19
  20. 20. People Metadata: Continued Web Presence: - User affiliations - Influence Metric – e.g., KLOUT (www.klout.com) Activity Metadata • Age of the profile • Frequency of posts • Timestamp of last status • No. of Posts • No. of Lists/groups created • No. of Lists/groups subscribed Influence Metadata (Inferring People Metadata from Network level Information) • No. of Followers – normal, influential • No. of Mentions • No. of Retweets/Forwards • No. of Replies • No. of Lists/groups following • No. of people following back • Authority & Hub Scores 20
  21. 21. Content Metadata: Content Dependent (Tweet) 23 Direct Content-based Metadata Indirect content-based metadata (External metadata)
  22. 22. Direct Content-based Metadata Content Metadata: Content Dependent (SMS) 24
  23. 23. Connections/Relationships matter! (foundation for the network) Network Metadata 25 Structure Metadata • Community Size • Community growth rate • Largest Strongly Connected Component size • Weakly Connected Components & Max(WCC) size • Average Degree of Separation • Clustering Coefficient Relationship Metadata • Type of Relationship • Relationship strength • User Homophily (based on certain characteristic such as location, interest etc.) • Reciprocity: mutual relationship • Active Community/ Ties
  24. 24. Metadata Creation & Extraction Length: 109 characters General topic: Egypt protest This poor {sentiment_expression: {target: “Lara Logan”, polarity: “negative”}} woman! RT @THR CBS News‘ {entity:{type=“News Agency”}} Lara Logan {entity:{type=“Person”}} Released From Hospital {entity:{type=“Hospital”}} After Egypt {entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0 {external_URL} 26
  25. 25. Metadata Extraction from Informal Text Meena Nagarajan, ‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
  26. 26. Content Analysis: Typical Sub-tasks • Recognize key entities mentioned in content – Information Extraction (entity recognition, anaphora resolution, entity classification..) – Discovery of Semantic Associations between entities • Topic Classification, Aboutness of content – What is the content about? • Intention Analysis – Why did they share this content? 28 • Sentiment Analysis – What opinions are people conveying via the content? • Author Profiling – What can we infer about the author from the content he posts? • Context (external to content) extraction – URL extraction, analyzing external content
  27. 27. • Named Entity Recognition – I loved <movie> the hangover </movie>! • Key Phrase Extraction 29 NER, Key Phrase Extraction
  28. 28. Named Entity Recognition “I loved your music Yesterday!” Yesterday is an album “It was THE HANGOVER of the year..lasted forever.. The Hangover is not a movie So I went to the movies..badchoice picking “GI Jane”worse now” GI Jane is a movie 30 Task of NER : Identifying and classifying tokens
  29. 29. Analysing the Content can be Hard… Using a domain model (E.g., MusicBrainz) Using context cues from the content • e.g. new Merry Christmas tune Reduce potential entity spot size (with restrictions) • e.g. new albums/songs Multimodal Social Intelligence in a Real-Time Dashboard System Analyzing the content can be hard 31
  30. 30. 32 Music NER application : BBC SoundIndex (IBM Almaden) Pulse of the Online Music Populace Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’ special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
  31. 31. The Vision http://www.almaden.ibm.com/cs/projects/iis/sound/ 33
  32. 32. 34
  33. 33. Several Insights 35 Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source? Ignoring Spam can change ordering of popular artists Trending popularity of artists Trending topics in artist pages
  34. 34. Predictive Power of Data • Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. • User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. • Challenging traditional polling methods! 36
  35. 35. KEY PHRASE EXTRACTION 37
  36. 36. Key Phrase Extraction - Example • Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day 38
  37. 37. 39 M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009: 539-553 TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently Foreign relations surfaces up
  38. 38. INTENTION MINING 40
  39. 39. Why do people share? • Outside of the psychological incentives, broadly, people share to Seek Information OR Share Information • If we understand the intent behind a post, we can build systems that respond to it better • An application: Understand intent to deliver targeted content – Use case: Online Content-Targeted Advertisements on Social Media Platforms 41
  40. 40. Circa 2009 -Content-based Ads 42
  41. 41. Today – Content-based Ads on Profiles 43
  42. 42. What is going on here.. • Ads are targeted on profile interests, demographic data • But Interests on profiles do not translate to purchase intents – Interests are often outdated.. – Intents are rarely stated on a profile.. • Some profile data does seem to work – Example: New store openings, sales targeted at location information in a profile 44
  43. 43. But Monetizable Intents are Elsewhere, away from their profiles.. 45
  44. 44. Showing clear intents on MySpace posts but no relevant ads.. 46
  45. 45. –Non-trivial –Non-policed content •Brand image, Unfavorable sentiments –People are there to network •User attention to ads is not guaranteed –Informal, casual nature of content •People are sharing experiences and events –Main message overloaded with off topic content I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008 Targeted Content-based Advertizing 47
  46. 46. Focus: Discuss Methodology, Preliminary Results in… • Identifying intents behind user posts on social networks – Identify Content with monetization potential • Identifying keywords for advertizing in user-generated content – Considering interpersonal communication & off-topic chatter 48 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
  47. 47. Result - 8X more interest for non-profile ads.. • Using profile ads – Total of 56 ad impressions – 7% of ads generated interest • Using authored posts – Total of 56 ad impressions – 43% of ads generated interest • Using topical keywords from authored posts – Total of 59 ad impressions – 59% of ads generated interest 49
  48. 48. SENTIMENT / OPINION MINING 50
  49. 49. Sentiment Analysis: Motivation Which movie should I see? What customers complain about? Why do people oppose health care reform? Image: http://bit.ly/eZtKBF 51
  50. 50. Content Analysis: Sentiment Analysis/Opinion Mining • Two main types of information we can learn from user- generated content: fact vs. opinion • Much of social media text (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions. • Extracting structured sentiment information from unstructured content • Allowing computation to be done on “what people think” and “how people feel” 52
  51. 51. • From coarse-grained to fine-grained – Document level -> sentence level -> expression level – General sentiment -> domain-dependent sentiment -> target- dependent sentiment • From static to dynamic – Our attitude can be changed during social communication. • Modeling, detecting, and tracking the change of attitude • What leads to the change of attitude? E.g., persuasion campaign 53 Sentiment Analysis: Challenges
  52. 52. Sentiment Analysis: Target-specific Opinion Identification Observations: • The opinion clues may not be toward the given target (1,2,3,6) • The opinion clues are domain and context dependent (5,7) • Single words are not enough (4,7,8) Simple lexicon-based method doesn't work well. 54 Target of “sexy” is “Helena” Target of “terrific” is “reviews” “free” is not opinionated in movie domain. Target of “loving” is “telling” “well” in “as well” is not opinionated
  53. 53. 55 Extracting a diverse and richer set of sentiment-bearing expressions, including formal and slang words/phrases Assessing the target-dependent polarity of each sentiment expression A novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus Extracting Diverse Sentiment Expressions With Target-dependent Polarity from Twitter [Chen et al. ICWSM 2012]
  54. 54. The Usage of Background Knowledge 56
  55. 55. 57 Sentiment Analysis: Feature and Aspect Extraction Motivation • To understand a user’s opinions about a product at a fine-grained level, support opinion summarization for products, and automatically extract pros and cons from reviews it is essential to identify product features and aspects. Impact • Existing methods tend to require seed terms and focus on identifying explicit features or a few high-level aspects. • Our approach is capable of identifying both explicit and implicit aspects and does not require any labeling efforts. Approach • We use a combination of corpus-based association measures, and semantic similarity measures to identify product aspects in an efficient clustering based approach.
  56. 56. 58 Clustering for Aspect Discovery in Opinion Mining [Chen et al. in submission]
  57. 57. 59 It is actually about tracking public opinion. PollingorSocial Media Analysis? 1. Sample size 2. Representative of the target population 3. Accurate measure of opinions 4. Timeliness
  58. 58. • We Study different groups of social media users who engage in the discussions of 2012 U.S. Republican Presidential Primaries, and compare the predictive power among these user groups. • Existing studies on predicting election result are under the assumption that all the users should be treated equally. • How could different groups of users be different in predicting election results? 60 Harnessing the Power of Social Data to Predict Election Results [Chen et al., SocInfo 2012]
  59. 59. 61 1. Engagement Degree 2. Tweet Mode 3. Content Type 4. Political Preference User Categorization
  60. 60. Predicting a User's Vote • Basic idea: for which candidate the user shows the most support – Frequent mentions – Positive sentiment 62 Nm(c): the number of tweets mentioning the candidate c Npos(c): the number of positive tweets about candidate c Nneg(c): the number of negative tweets about candidate c  (0 <  < 1): smoothing parameter  (0 <  < 1): discounting the score when the user does not express any opinion towards c. The user posted opinion about c The user mentioned c but did not post opinion about c More mentions, higher score More positive/less negative opinions, higher score
  61. 61. 63 Revealing the challenge of identifying the vote intent of “silent majority” Retweets may not necessarily reflect users' attitude. Prediction of user’s vote based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets The right-leaning user group provides the most accurate prediction result. It correctly predict the winners in 8 out of 10 states with an average prediction error of 0.1. To some extent, it demonstrates the importance of identifying likely voters in electoral prediction. Twitter users are not “equal” in predicting elections!
  62. 62. EMOTION MINING 64
  63. 63. Emotion Mining: Motivation 65 • Emotion is essential to all aspects of our lives. – Influences our decision-making – Affects our social relationships – Shapes our daily behavior • Emotional mental health – New mothers may suffer from post-partum depression – Veterans may constantly suffer from negative emotions because of post-traumatic stress disorder
  64. 64. Emotion Mining: what have we studied 66 • Can we automatically create a large emotion dataset with high quality labels from Twitter? How? • What features can effectively improve the performance of supervised machine learning algorithms? • Can the system developed on Twitter data be directly applied to identify emotions from other datasets? • What can we learn about emotion from social media data?
  65. 65. • Collect self-annotated emotion tweets [Wang et. al. SocialCom 2012] – Seven emotions: joy, sadness, anger, love, fear, surprise, thankfulness “When I see a cop, no matter where I am or what I’m doing, I always feel like every law I’ve ever broken is stamped all over my body #fear” “I hate when my mom compares me to my friends. #anger” “I hate when I get the hiccups in class. #embarrassing” Harnessing twitter" big data" for automatic emotion identification [Wang et al. SocialCom12] 67
  66. 66. 0.4 0.45 0.5 0.55 0.6 0.65 1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184 accuracy number of tweets in training data LIBLINEAR MNB The more data, the merrier 68 Results of performing seven emotion classifications
  67. 67. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 69 • Automatically classify suicide notes to different (15) categories at sentence level • Emotion categories – Positive • Hopefulness, thankfulness, forgiveness, love, pride, happiness – negative • Sorrow, abuse, anger, hopelessness, guilt, blame, fear • Other categories – Information, instructions
  68. 68. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 70 Sentence: “Found out today that // I passed my math STAAR test.” • N-gram features • Unigram, e.g., found, today, passed, etc. • Bigram, e.g., found_out, out_today, etc. • N-gram position – Unigram: found-1, out-1, today-1,…,, I-2, passed-2, my-2, … • Knowledge-based features: – LIWC (Pennebaker et al., 2014a) – WordNet-Affect (Strapparava and Valitutti, 2004) – MPQA (Wilson et al., 2005) • Syntactic features: – Part-of-speech tags, e.g., Found/VBN out/RP today/NN that/IN I/PRP passed/VBD… – Dependency relations, e.g., root(ROOT-0, Found-1); ccomp(Found-1, passed-6); dobj(passed-6, test-10) …
  69. 69. Discovering Fine-grained Emotion in Suicide Notes [Wang et al. BII12] 71 Winner: N-gram(1,2), knowledge-based and syntactic features
  70. 70. Cursing in English on Twitter [Wang et al. CSCW14] 72 • The main reason that people use curse words is to express some strong emotions, especially anger and frustration. [Jay 1992, 2000; McEnergy 2006; Nasution and Rosa 2012]
  71. 71. Normalized Emotion Distributions over Time in Eastern Standard TimeNormalized Emotion Distributions over Days (EST) “I am so thankful for my family && close friends. They hold me together when everything else around me is falling apart. #SoBlessed #Thankful” 73
  72. 72. Normalized Emotion Distributions over Time (EST) “I thank God everytime I see another day :*) #thankful .” 74
  73. 73. Rank Mom Dad 1 Irritation (7, 562) Irritation (3, 034) 2 Sadness (2, 315) Sadness (1, 363) 3 Affection (2, 225) Embarrassment (1, 158) 4 Zest (2, 213) Zest (1, 035) 5 Embarrassment (1, 849) Affection (1, 030) 6 Thankfulness (1, 537) Cheerfulness (911) 7 Cheerfulness (1, 332) envy (902) “I hate when my dad uses my laptop. Its mine. Not yours. You have your own computer. I have shit to do, get off now please. #annoyed” “ugh my mom gets so nervous when i drive #annoying” “My mom just told me I can't open any presents early cause I'm too old for that #depressing” What are the top Emotions Associated with Moms and Dads? 75
  74. 74. PEOPLE ANALYSIS - Deriving People Metadata - from Content Analysis - from Network Analysis - Merge of two approaches - People-Content-Network Analysis to leverage the metadata - Finding Influential Users - Finding User Types & Affiliation - Measuring Social Engagement - Leverage communities to assist coordination 76
  75. 75. People Analysis: Social Engagement & Coordination 77 Imagine a crisis scenario such as Haiti earthquake (2010) or hurricane Sandy (2012) - emergency teams are looking for ways to help the victims • What are the best possible ways to communicate: identify and engage people • Between resource providers (supply) and people in need of resources (demand) • Topical community influencers • How response teams can coordinate social media communities well between volunteers, managers in organizational structure, and resource seekers?
  76. 76. People Analysis: Who is asking for help, Who is offering to help? Smart Data in the context of Disaster Management ACTIONABLE: Timely delivery of right resources and information to the right people at right location! 78 Because everyone wants to Help, but DON’T KNOW HOW!
  77. 77. Really sparse Signal to Noise: • 2M tweets during the first 48 hrs. of #Oklahoma-tornado-2013 - 1.3% as the precise resource donation requests to help - 0.02% as the precise resource donation offers to help 79 • Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity (OFFER) • I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) Disaster Response Coordination: Finding Actionable Nuggets for Responders to act • Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST) • Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST) For responders, most important information to manage coordination dependencies is the scarcity and availability of resources Blog by our colleague Patrick Meier on this analysis: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/
  78. 78. People Analysis: Match demander- suppliers for coordination during crisis Purohit, H., Castillo, C., Diaz, F., Sheth, A., & Meier, P. (2013). Emergency-relief coordination on social media: Automatically matching resource requests and offers. First Monday, 19(1). 80
  79. 79. Demand-Supply identification and representation: core & facets • Extract Core of the phrase- “what” – Other facets includes “who”, “where”, “when”, etc. • Supervised Learning to classify items for demands, supplies, and resource type facets 81 Rotary collecting clothing and other donations in New Jersey <URL> { source: “Twitter”, author: “@NN”, text: “Rotary collecting clothing and other donations in New Jersey <URL>”, donation-info: { donation-type: “Request”, donation-type-confidence: 0.8, donation-organization: “Rotary”, donation-item: “clothing and other donations”, donation-location: “New Jersey” }, … } Corresponding data item in the semi-structured knowledge inventory: • IR model approach to match demand (request) with supply (offer) items in this semantically annotated knowledge inventory
  80. 80. Leveraging Communities for Whom to Engage With, Why and How 82 Purohit et al., User Taglines: Alternative Presentations of Expertise and Interest in Social Media . ASE Social Informatics, 2012
  81. 81. Network Analysis Interesting questions to ask: • How communities form around topics- growth & evolution • What are the effects of influential participants in the communities • What are the effects of content nature (or sentiment, opinions) flowing in network on the community structures and growth • What is the community structure: degree of separation and sub- communities that contribute for macro-level effects, e.g., coordination, engagement “To Discover How A, is in Touch with B and C, Is Affected by the Relation Between B & C” -John Barnes 83 Foundation of network: •Nodes •Connections/Relationships Image: http://www.onasurveys.com/
  82. 82. Graphs showing sparse (A) and dense (B) RT networks and their corresponding follower graphs for 'call for action' and 'information sharing' tweet content types M. Nagarajan, H. Purohit, and A. Sheth, ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010 84
  83. 83. Understanding Evolving Community Structures for Coordination 85 User interaction networks of two topical communities– Occupy LA and Chicago, of emerging influencers during Occupy Wall Street (OWS) event 2011 Application of evolving communities: H. Purohit, J. Ajmera, S. Joshi, A. Verma, A. Sheth. Finding Influential Authors in Brand-Page Communities. 6th Int'l AAAI Conference on Weblogs and Social Media (ICWSM), Dublin, Ireland, June 5-7, 2012
  84. 84. Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates Romney Obama Before 1st debate After 1st debate After Hurricane Sandy After 3rd debate Understanding Community Evolution for Real-World Actions 86 Social Media analysis for US elections 2012, powered by Twitris: http://analysis.knoesis.org/uselection/insights/
  85. 85. On Understanding the Divergence of Online Social Group Discussion • Change of group discussion divergence over time, and different phases of real world events • Relation between discussion divergence and existing theories of social cohesion and social identity in Psychology • Prediction of future change in the group discussion divergence Research Questions on Social Dynamics in Communities Acknowledgement: NSF SoCS grant for ‘Leveraging Social Media during Emergency Response’ Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media.
  86. 86. • Prior work: – Focus on structural metrics to understand group evolution dynamics, but may not be sufficient to answer ‘WHY a group diverges over time’ • Our approach: – Content driven measure: collective divergence of group members for topics of discussion – Features assessing role of socio-psychological theories: cohesion & identity • Data: – Tweets during evolving events of natural disasters, and social activism Contrasting Prior Work and Approach Evolution of groups in online social communities surrounding events  On Understanding the Divergence of Online Social Group Discussion Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media. 88
  87. 87. • During #sandy, predicted low diverging (focused) groups to engage with on the updates of flights, first delays & cancellation, then resuming • Natural disaster (D) events (Hurricane Irene and Sandy) have stronger correlations with identity-driven features than with cohesion featuresWe predicted group discussion divergence across phases, by 0.83 AUC Time On Understanding the Divergence of Online Social Group Discussion Purohit, H., Ruan, Y., Fuhry, D., Parthasarathy, S., & Sheth, A. (2014, May). On Understanding Divergence of Online Social Group Discussion. In 8th Intl AAAI Conference on Weblogs and Social Media. 89
  88. 88. Continuous Semantics for Evolving Events to Extract Smart Data 90
  89. 89. Dynamic Model Creation Continuous Semantics 91
  90. 90. Live Demo of Powerful Social Media Analysis: Twitris 92
  91. 91. Twitris - Motivation 1. Information Overload • Multiple events around us • WHAT to be aware of • Multiple Storylines about same event!! 93 Image: http://bit.ly/etFezl
  92. 92. Twitris - Motivation 2. Evolution of Citizen Observation • with location and time 94
  93. 93. Twitris - Motivation 3. Semantics of Social perceptions • What is being said about an event (theme) • Where (spatial) • When (temporal ) Twitris lets you browse citizen reports using social perceptions as the fulcrum 95
  94. 94. Twitris: Semantic Social Web Mash-up Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media 96 96
  95. 95. Twitris: Architecture 97 Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, ‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553, Oct 5-7, 2009.
  96. 96. Twitris: Functional Overview 98
  97. 97. Twitris: Event Summarization 99
  98. 98. Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC Legends for Different needs #OKC 100 Clicking on a tag brings contextual information– relevant tweets, news/blogs, and Wikipedia articles Twitris: Real-time information
  99. 99. How People from Different parts of the world talked about US Election Images and Videos Related to US Election 101 Twitris: Analysis by location for contrast in social perceptions
  100. 100. Twitris: Sentiment Analysis • Sentiment Analysis – using statistical and machine learning techniques 102
  101. 101. 103 How was Obama doing in the first debate? Twitris: Sentiment Analysis- Smart Answers with reasoning!
  102. 102. The Dead People mentioned in the event OWC 104 Twitris: Impact of Background Knowledge
  103. 103. Twitris: Demo, Quick Show http://twitris2.knoesis.org/ • Many other interesting efforts – Eg: Vivek K. Singh, Mingyan Gao, and Ramesh Jain. 2010. From microblogs to social images: event analytics for situation assessment. In Proceedings of the international conference on Multimedia information retrieval (MIR '10). ACM, New York, NY, USA, 433-436. 105
  104. 104. • Do you have a sense of immense opportunity of analyzing citizen sensing for useful social signals? • Do you appreciate the broad range of issues and challenges? Did we present examples and a few insights into how to address some unique challenges? • Did spatio-temporal-thematic, people-content-network, emotion-sentiment-intent dimensions present reasonable way to organize vast number of relevant research challenges and techniques? 106 Conclusions
  105. 105. 107 http://knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA thank you, and please visit us at

×