Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computational Social Science as the Ultimate Web Intelligence


Published on

Kno.e.sis Projects at the Intersection of Big Data, AI, Social Good and Health
Panel at Web Intelligence, Dec 4-6, 2018, Santiago Chile

  • Be the first to comment

Computational Social Science as the Ultimate Web Intelligence

  1. 1. Computational Social Science as the Ultimate Web Intelligence Kno.e.sis Projects at the Intersection of Big Data, AI, Social Good and Health Panel at Web Intelligence 2018 Prof. Amit Sheth LexisNexis Ohio Eminent Scholar Executive Director, Kno.e.sis - Ohio Center of Excellence in Knowledge-enabled Computing & BioHealth Innovation Presentation template by SlidesCarnival Photographs by Unsplash Icons by thenounproject
  2. 2. Big Data | Social Media | AI 2 Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification 2.5 M Tweets with Machine Learning algorithms Trends Emotions eDrugTrends - Identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. Web Forum Data & Tweets with NLP, ML & Semantic Web Technologies Intents Sentiments Hazards SEES - Cross-modal aggregation of Multi-modal & Multi-disciplinary Data to support human efforts in disaster management Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter Opinions 400 000 Tweets with an Optimization Model People Places Times
  3. 3. Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter 14 million tweets collected from Twitter over a period of 10 months 3 1. Gender-based violence in 140 characters or fewer: A #BigData case study of Twitter, Hemant Purohit, Tanvi Banerjee, Andrew Hampton, Valerie L. Shalin, Nayanesh Bhandutia, and Amit Sheth, First Monday, Volume 21, Number 1 - 4 January 2016
  4. 4. Outcomes of Analysis ◎ Trends of GBV tweets across 5 countries; USA, India, Philippines, Nigeria, South Africa. 4 ◎ Three thematic groups of GBV tweets: physical violence, sexual violence, and harmful practices. ◎ Nigeria has the highest percentage of tweets with URLs in comparison to other countries. ◎ Numerous explanations; ○ Literacy, ○ Credibility of the public press ○ Possibility that reliance on external resources somehow reduces the threat of being identified as the responsible party.
  5. 5. Context-Aware Harassment Detection on Social Media 24 000 tweets collected Supervised ML methods used 5 1. Mohammadreza Rezvan, Saeedeh Shekarpour, Lakshika Balasuriya, Krishnaprasad Thirunarayan, Valerie L. Shalin, Amit Sheth. A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research. Web Science, WebSci 2018, Amsterdam, The Netherlands, May 27-30, 2018 2. Mohammadreza Rezvan, Saeedeh Shekarpour, Thirunarayan, K., Valerie L. Shalin, Sheth, A. (2018). Analyzing and learning the languagefor different types of harassment Knoesis wiki for Context-Aware Harassment Detection on Social Media
  6. 6. Outcomes and Insights Lexicon Covering different types of harassment content ● Sexual ● Political ● Racial Tweets 24 000 non-redundant annotated tweets with 3000 are labeled as harassing Features Combination of features resulted in best accuracy ○ TFIDF ○ word2vec ○ paragraph2vec ○ LIWC vector ML Methods Gradient Boosting Machine (GBM) outperformed SVM, KNN and NB 6 ● Intellectuel ● Appearance - related ● General
  7. 7. 7 1. Gaur, Manas, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. "Let Me Tell You About Your Mental Health!: Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." In Proceedings of the 27th ACM CIKM 2018. Patient ClinicianEMR Insight DSM-5 & Drug Abuse Ontology Improved Healthcare Classification of Reddit Content to DSM-5 for Web-based Intervention 3 Million Posts from 270K Reddit Users collected From 2005-2015 with zero shot learning Provide clinicians, insights of their patients Knoesis wiki for Modeling Social Behavior for Healthcare Utilization in Depression
  8. 8. Outcomes & Insights 9 Our sophisticated methods have reduced the false alarm rate to 3% - 5% by incorporating domain knowledge and slang terms in social media data
  9. 9. Views: People - Content - Network Information in tweets by a user displays an intent based on the user type: Personal accounts share opinions, Retail accounts promote related products for sale, Media accounts disseminate information. Proper incorporation of each view is essential to better represent characteristics of users. User Modeling in Marijuana-related Communications 11 Multimodality - The information shared in different formats contributes to the meaning: Text, Image, Emoji, Interactions - Translation of image and emoji to textual representation using state-of-the-art tools such as EmojiNet. People: user description, emoji, profile pictures. Content: text, emoji Network: interactions with other users: retweets and mentions. 🏈 😉 🍔 1. Ugur Kursuncu, Manas Gaur, Usha Lokala, Anurag Illendula, Krishnaprasad Thirunarayan, Raminta Daniulaityte, Amit Sheth, and I. Budak Arpinar. "" What's ur type?" Contextualized Classification of User Types in Marijuana-related Communications using Compositional Multiview Embedding." In Proceedings of IEEE International Conference on Web Intelligence, 2018 Knoesis wiki for eDrugTrends
  10. 10. Outcomes & Insights ◎ Incorporation of multimodal data, specifically profile pictures and network interactions, significantly contributes into the classification of users. ◎ Multimodality significantly improves the classification performance in the case of imbalanced dataset, e.g., profile pictures of users. ◎ Compositional of embeddings of views (e.g., person, content, network) provide more coherent representation of users. 12 Features Personal Media Retail 1 Tweet + Desc 0.95 0.42 0.73 2 w/ Composition 0.94 0.18 0.71 3 w/ Metadata 0.94 0.17 0.72 4 w/ Image 0.97 0.72 0.87 5 w/ Network 0.98 0.73 0.91 F-Scores for each user type
  11. 11. Fusing Visual, Textual and Connectivity Clues for Studying Mental Health Knoesis wiki for Modeling Social Behavior for Healthcare Utilization in Depression Develop a multimodal framework and employing statistical techniques for fusing heterogeneous sets of features obtained by processing visual, textual and user interaction data to identify depressive behavior and demographic inference. 13 1. Amir Hossein Yazdavar, Mohammad Saied Mahdavinejad, Goonmeet Bajaj, Krishnaprasad Thirunarayan, Jyotishman Pathak and Amit Sheth. Fusing Visual, Textual and Connectivity Clues for Studying Mental Health in Population. In: 30th International Conference on World Wide Web (Submitted WWW-2019) ◎ How well do the content of posted images (colors, aesthetic and facial presentation) reflect depressive behavior? ◎ Does the choice of profile picture show any psychological traits of depressed online persona? Are they reliable enough to represent the demographic information such as age and gender? ◎ Are there any underlying common themes among depressed individuals generated using multimodal content that can be used to detect depression reliably?
  12. 12. Outcomes & Insights 14 Characterizing Linguistic Patterns in two aspects: Depressive-behavior and Age Distribution Gender Biases and Depressive Behavior Association (Chi- square test: color- code: (blue:association), (red: repulsion), size: amount of each cell’s contribution) The age distribution for depressed and control users in ground-truth dataset
  13. 13. Outcomes & Insights 15 The explanation of the log-odds prediction of outcome (0.31) for a sample user (y-axis shows the outcome probability (depressed or control), the bar labels indicate the log-odds impact of each feature) Ranking Features obtained from Different Modalities with Boruta Algorithm
  14. 14. Create value from data that supports action Big Data & AI 16 What can we do that is unique? Emotions Sentiments Intentions Derive Insights Scale to identify important & relevant issues to human kind Floods Earthquake Wildfires Tsunami Derive insights from data Do more exercises Reduce sugar intake Increase water intake More at:,
  15. 15. Thank You! 17