SlideShare a Scribd company logo

Introduction to Mining Social Media Data

Tutorial at the Alberto Mendelzon Workshop, Colombia May 2018

1 of 106
Download to read offline
1!
Alberto Mendelzon Workshop 21th May 2018
1!
Introduction to Mining Social Media Data
Miriam Fernandez
Knowledge Media Institute
Open University, UK
@miriam_fs
@miriamfs
Credit to all these fantastic people!!
2!
Alberto Mendelzon Workshop 21th May 2018
2! Who we are?
2
3!
Alberto Mendelzon Workshop 21th May 2018
3! Before we start…
•  1.- This is an after lunch session…
–  Hope you took the necessary precautions!
•  2.- It is an introductory tutorial
–  If you were expecting something very complex this is not
for you, go out and enjoy the sun J
•  3.- I hate talking alone for long periods of time
–  Please ask or discuss anything you want at any point!
•  4.- hands-on excercises available
–  Fantastic tutorial @TheWebConf by some of my
colleagues! J
https://github.com/evhart/smasac-tutorial/blob/
master/README.md (jupyter notebooks)!
4!
Alberto Mendelzon Workshop 21th May 2018
4
Understanding Social
Media
5!
Alberto Mendelzon Workshop 21th May 2018
5! Most Used Social Media Platforms
Source: https://techcrunch.com/2017/06/27/facebook-2-billion-users/
6!
Alberto Mendelzon Workshop 21th May 2018
6! Not the Only Ones
Smaller and less famous (open and closed) communities addressing particular
geographic regions, specific user groups or niche interests thrive on the Web!
7!
Alberto Mendelzon Workshop 21th May 2018
A World-wide Phenomenon
Number of social network users worldwide in billions!
Source: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
8!
Alberto Mendelzon Workshop 21th May 2018
Number of social network users in
selected countries (in millions)!
Source: https://www.statista.com/statistics/278341/number-of-social-network-users-in-selected-countries/
9!
Alberto Mendelzon Workshop 21th May 2018
Full of Challenges
10!
Alberto Mendelzon Workshop 21th May 2018
Mining Social Media data,
for What?
Trivalent: http://trivalent-project.eu/
COMRADES: https://www.comrades-project.eu/
DecarboNet: https://www.decarbonet.eu/
Sense4us: http://www.sense4us.eu/
ROBUST: http://www.robust-project.eu/
OUSocial: http://oro.open.ac.uk/40883/1/ousocial2-demo.pdf
Some of the next slides from: https://www.slideshare.net/halani
11!
Alberto Mendelzon Workshop 21th May 2018
Studying social phenomena at scale!
12!
Alberto Mendelzon Workshop 21th May 2018
Social
Semantic
Statistical
Analysis
13!
Alberto Mendelzon Workshop 21th May 2018
Businesses
•  Many businesses provide online
communities to:
–  Increase customer loyalty
–  Raise brand awareness
–  Spread word-of-mouth
–  Facilitate idea generation
•  Online communities incur significant
investment in terms of:
–  Money spent on hosting and
bandwidth
–  Time and effort for maintenance
•  Community managers monitor
community ‘health’ to:
–  Ensure longevity
–  Enable value generation
•  However, the notion of ‘health’ is
hard to pin down
http://www.robust-
project.eu/
14!
Alberto Mendelzon Workshop 21th May 2018
Businesses
Monitoring of
evolution of
community
activities and level
of contributions in
SAP Community
Networks – SCN
15!
Alberto Mendelzon Workshop 21th May 2018
Reputation Fish Tank
https://www.youtube.com/watch?time_continue=57&v=KXRzdrDDt_8!
16!
Alberto Mendelzon Workshop 21th May 2018
Active OU communities on
Facebook
17!
Alberto Mendelzon Workshop 21th May 2018
•  How	ac've	and	engaged	the	course	group	is?	
•  How	is	sen'ment	towards	the	course	evolving?	
•  Are	the	leaders	of	the	group	providing	posi've/
nega've	comments?	
•  What	topics	are	emerging?		
•  Is	the	group	flourishing	or	diminishing?	
•  Do	students	get	the	answers	and	support	
they	need	or	not?		
DEMO
Education
18!
Alberto Mendelzon Workshop 21th May 2018
OUAnalyse
•  Social media data vs. VLE data to increase retention
Names
https://analyse.kmi.open.ac.uk/
19!
Alberto Mendelzon Workshop 21th May 2018
20!
Alberto Mendelzon Workshop 21th May 2018
21!
Alberto Mendelzon Workshop 21th May 2018
Automatic Categorisation
of Social Media Accounts
•  Objective:
–  Provide automatic identification of the main actors talking
about policy in social media
–  Allow policy researchers to concentrate on the opinions of
citizens vs. commercial organizations
•  Approach
Twitter
Data Data Collection Feature Engineering User Classification
Person
Company
NGO
MP
News & Media
22!
Alberto Mendelzon Workshop 21th May 2018
Policing
Olson’s psychological theory
of luring communication
(LCT)
Grooming data
•  Classification results:
–  Trust development: 79% P, 82% R, 81% F1
–  Grooming stage: 88% P, 89% R, 88% F1
–  Physical approach: 87% P, 89% R, 88% F1
23!
Alberto Mendelzon Workshop 21th May 2018
Energy
24!
Alberto Mendelzon Workshop 21th May 2018
Disaster Management
177 million tweets were posted in a
single day during the 2011 Japan
earthquake
Boston Marathon Bombing
broke on Twitter. On the news,
3 hours later!
25!
Alberto Mendelzon Workshop 21th May 2018
Ushahidi
26!
Alberto Mendelzon Workshop 21th May 2018
•  Crisis-related event detection is often divided into three main tasks
[Olteanu et al. 2015]:
Crisis-based Event Detection Tasks
Task
1.
Crisis vs. non-Crisis Related
Messages
Task
2.
Type of Crisis
Task
3.
Type of Information
Differentiate those posts that are related to a
crisis situation vs. those posts that are not
Identify the different types of crises the
message is related to
Differentiate those posts that are related to
a crisis situation vs. those posts that are
not
Shooting, Explosion,
Building Collapse, Fires,
Floods, Meteorite Fall, etc.
Affected Individuals, Infrastructures
and Utilities, Donations and
Volunteer, Caution and Advice, etc.
Granularity
27!
Alberto Mendelzon Workshop 21th May 2018
Disaster Management
https://evhart.github.io/crees/
28!
Alberto Mendelzon Workshop 21th May 2018
Be aware of the problems!
Fernandez, M., and Alani H. "Online Misinformation: Challenges and
Future Directions." Companion of the The Web Conference 2018.
http://oro.open.ac.uk/53734/
https://kmitd.github.io/recoding-black-mirror/
http://www.aolteanu.com/SocialDataLimitsTutorial/
29!
Alberto Mendelzon Workshop 21th May 2018
29! Strong need of Ethics!
30!
Alberto Mendelzon Workshop 21th May 2018
30! Re-coding Black Mirrors
31!
Alberto Mendelzon Workshop 21th May 2018
31! Bias on the Web at all levels!
http://www.aolteanu.com/SocialDataLimitsTutorial/
32!
Alberto Mendelzon Workshop 21th May 2018
Some considerations when collecting data
•  Automatic access to social media data can be
restricted in different ways:
–  Public / Non-public data: Most social media websites do not
allow access to the information posted unless reading access is
given explicitly by the information creator.
–  Query restrictions: Data access can be limited by API restrictions
(e.g., rate limiting, query allowance).
–  Data Sampling: High velocity data is sometimes sampled by
social media companies. As result, it is only possible to retrieve a
portion of the relevant information.
–  Query Filtering: Often data is retrieved using query parameters
(e.g., keywords, geolocation, etc.). Missing information / biased
information
33!
Alberto Mendelzon Workshop 21th May 2018
Some considerations when analysing data
–  User type may vary (e.g., news organisations, journalist,
companies, government, NGOs, etc.)
–  Populations may be biased (e.g., not all distributions of
ages/ gender / political views / etc.)
–  Type of information shared may vary: (e.g., during a
disaster you may have messages about: affected
individuals, caution and advice, donation or volunteering,
message of support, etc.)
–  Type of content shared may vary (e.g., text, images,
videos, links).
–  Target audience may vary (e.g., general public, other
organisation, followers, friends/family).
–  Social media platforms to communicate the message
may vary, or more than one may be in use (e.g., Facebook,
Twitter, etc.)
34!
Alberto Mendelzon Workshop 21th May 2018
34!
https://shorensteincenter.org/information-disorder-framework-for-research-and-policymaking/
35!
Alberto Mendelzon Workshop 21th May 2018
35! Types of Misinformation and Disinformation
7 Types of Mis- and Dis-information (Credit: Claire Wardle, First Draft)
36!
Alberto Mendelzon Workshop 21th May 2018
36!
Affecting the decision making processes in many
domains
37!
Alberto Mendelzon Workshop 21th May 2018
37! Dimensions of Combating Online Misinformation
•  Misinformation content detection
–  Are misinformation content and sources automatically identified? Are streams of
information automatically monitored? Is relevant corrective information identified
as well?
•  Misinformation dynamics
–  Are patterns of misinformation flow identified and predicted? Is demographic and
behavioural information considered to understand and predict misinformation
dynamics?
•  Content Validation
–  Is misinformation validated and fact checked? Are the users involved in the
content validation process?
•  Misinformation management
–  Are citizens’ perceptions and behaviour with regards to processing and sharing
misinformation studied and monitored? Are intervention strategies put in place to
handle the effects of misinformation?
38!
Alberto Mendelzon Workshop 21th May 2018
38! Misinformation Content Detection
Network & propagation patternsInformation source
Content Text/images/videos
Context Lists of
misleading sites
specific features
(hashtags, mentions)
http://www.opensources.co/
Misinformation?
39!
Alberto Mendelzon Workshop 21th May 2018
39! Misinformation Dynamics
Low content diversity and strong social reinforcement
Homophily
Polarisation
Algorithmic ranking/
personalisation
Social bubbles
•  Misinform
ation
spreads
faster
and more
widely
across the
network
•  Misinformation can be
attributed to/ spread by
bots & crowdturfing
•  Users that use more
social words and
affection are more
susceptible to interact
with bots
•  Extroverts are more
prone to share
misinformation
•  Users tend to select and
share content based on
homogeneity (echo
chambers). An effect
exacerbated by ranking and
personality algorithms
•  In social media environments,
where users are influenced by
high information load and
finite attention, low quality
information is likely to go viral.
•  Different types of
misinformation spread
differently. Scientific news
have a higher level of
diffusion but decay faster.
Conspiracy theories are
spread slower over longer
time periods
•  Even when denied, the
rumour cascades
continues to propagate
40!
Alberto Mendelzon Workshop 21th May 2018
40! Content Validation
•  Full Fact, UK
•  Snopes and Root Claim, US
•  FactCheckNI, Northern Ireland
•  Pagella Politica, Italy
COMPUTATIONAL
FACT CHECKER
Automatically extract claims and validates them against a variety of
information sources
Knowledge Bases DBs of manually assessed facts
by experts
Crowdsourcing for
annotation and/or
verification
Truth Teller
Whether a claim is accepted
by an individual is strongly
influenced by the individual’s
believe system (confirmation
bias / motivated reasoning)
41!
Alberto Mendelzon Workshop 21th May 2018
41! Misinformation Management
Simply presenting people with corrective information is likely to fail in changing
their salient beliefs and opinions, or may, even, reinforce them
Provide an
explanation
rather than a
simple refute
Expose the user
to related but
disconfirming
stories
Revealing the
demographic
similarity of the
opposing group
Expose the
users to “small
doses” of
misinformation
Combatting
misinformation
Facts
Early detection of
malicious accounts
Use of ranking and
selection strategies
based on corrective
information
42!
Alberto Mendelzon Workshop 21th May 2018
42! Comparison of Relevant Platforms
43!
Alberto Mendelzon Workshop 21th May 2018
43! Limitations
•  Misinformation content detection
–  Do not provide rationale or explanation of their decisions
–  Disengage users by regarding them as passive consumers rather than as active co-creators
and detectors of misinformation
•  Misinformation dynamics
–  Do not consider the typology and topology of the different networks
–  Do not take into account how the misinformation-handling behaviour of users influences the
spread of misinformation
•  Content Validation
–  Not able to cope with the high volume of misinformation generated online
–  Often disconnected from where the users tend to read, debate and share misinformation.
•  Misinformation management
–  Tend to focus on the technical and not on the human aspects of the problem (i.e., motivations
and behaviours of the users when generating and spreading misinformation)
44!
Alberto Mendelzon Workshop 21th May 2018
44! Research Directions
•  User Involvement
–  Participation of all stakeholders, including end users, social scientists, computer
scientists, educators, etc., in the co-design of their functions, user interfaces, and
delivery methods
•  Misinformation Dynamics
–  Study how platform-specific and network-specific features influence the dynamics of
misinformation
•  Content Validation
–  Embed fact checkers into the environments where users tend to read, debate, and
share misinformation (plugins)
•  Misinformation Management
–  Understanding user behaviour towards misinformation, what opinions users form about
it, and how these opinions evolve over time, are key to successfully manage the impact
of misinformation.
–  Technology can be used to test the effectiveness of various misinformation
management policies and techniques, as well as to deploy them at scale.
45!
Alberto Mendelzon Workshop 21th May 2018
Modeling Social Media
Data
SIOC: http://sioc-project.org/
M Fernandez, A Scharl, K Bontcheva, H Alani. User Profile Modelling
in Online Communities. SWCS’14 Third International Workshop on
Semantic Web Collaborative Spaces. ISWC 2014
http://oro.open.ac.uk/41395/
46!
Alberto Mendelzon Workshop 21th May 2018
Data Integration
•  Social Networking Sites are like data silos
–  Many isolated communities of users with their data
•  The same user can participate in different social networks
–  Miriam.fs / miriamfs / mfs
•  The same topic can be discussed in different social networks
–  Need ways to connect them
•  To develop portable analysis models
•  To allow users to access their data uniformly across SNS
•  To allow automatic data portability from one SNS to another one
Source: J.Breslin: The Social Semantic Web: An Introduction http://www.slideshare.net/Cloud/the-social-semantic-web-an-introduction
47!
Alberto Mendelzon Workshop 21th May 2018
Users / Content / Collaborative Environment
Demographic
characteristics
•  Birthday
•  Location
•  Sex
Preferences
Social Network
Collaborative
Environment
Behaviour Personality
Content
The User
Needs
SUM
SUM MESHOUBO
SIOC
FOAF
Schema.org
Microformats
SemSNA
SIOC
OPO
Schema.org
FOAF
MESH
MESH Domain of
Discussion
PAO
48!
Alberto Mendelzon Workshop 21th May 2018
Using SIOC to Model Twitter Data
sioc:reply_of/
sioc:has_reply
sioct:
Microblog
Post
Tweet
URL
sioc:content
Tweet
Text
dcterms:created
Tweet
creation
time
sioc:has_container/
sioc:container_of
sioct:
Microblog
sioc:has_creator/
sioc:creator_of
sioc:UserAccount sioc:name Screen
name
sioc:has_space/
sioc:space_of
sioc:Site
Twitter
homepage
sioc:topic
sioct:Tag
sioc:name
Extracted
hashtag
sioc:links_to
Extracted
link
sioc:mentions
sioc:follows
sioc:subscriber_of/
sioc:has_subscriber,
sioc:isPartOf/
sioc:hasPart
sioc:has_owner/
sioc:owner_of
geo:long
Tweet
Longt.
geo:lat
Tweet
Lat.
gn:Feature
sioc:about
...
geo:Point
geo:location
dcterms:created
Account
creation
time
sioc:note
Account
description
sioc:avatar
Avatar URL
User
Twitter
homepage
User
ID
dcterms:title
User
name
sioc:forwarded_by
sioc:Container
Twitter
list ID
sioc:addressed_to
49!
Alberto Mendelzon Workshop 21th May 2018
49
Mining Social Media Data,
How?
50!
Alberto Mendelzon Workshop 21th May 2018
Analysis
•  Behaviour Analysis
•  Sentiment Analysis
51!
Alberto Mendelzon Workshop 21th May 2018
Behaviour Analysis
(in a climate change
context)
Fernandez, M., Piccolo, L., Alani, H., Maynard, D., Meili, C., & Wippoo, M. (2017). Pro-
Environmental Campaigns via Social Media: Analysing Awareness and Behaviour
Patterns. The Journal of Web Science, 3(1).
http://www.webscience-journal.net/webscience/article/view/44/30
Fernández, M., Burel, G., Alani, H., Piccolo, L. S. G., Meili, C., & Hess, R. (2015).
Analysing engagement towards the 2014 earth hour campaign in Twitter.
http://oro.open.ac.uk/43621/1/ENVINFO2015_v12.pdf
52!
Alberto Mendelzon Workshop 21th May 2018
52! Problem
•  Individual behaviour change is a central strategy to mitigate
climate change
•  However, public engagement is still limited
53!
Alberto Mendelzon Workshop 21th May 2018
53! Problem
•  Pro-environmental campaigns,
particularly via social media
•  Unclear how existing theories
and studies of behaviour
change can be applied to
practical settings, particular
social media campaigns, to
better target and inform users
54!
Alberto Mendelzon Workshop 21th May 2018
54! Research Questions
•  RQ1: How can we translate theories of behaviour change into
computational methods to enable the automatic identification of
behaviour?
•  RQ2: How can the combination of theoretical perspectives and the
automatic identification of behaviour help us to develop effective
social media communication strategies for enabling behaviour
change?
55!
Alberto Mendelzon Workshop 21th May 2018
55! Literature Review (I)
•  Behaviour Change
–  Socio-psychological models of behaviour (mainly at individual level)
–  Theories of change (5 Doors Theory [Robinson])
56!
Alberto Mendelzon Workshop 21th May 2018
56! Literature Review (II)
•  Intervention Strategies
–  Information
–  Discussions
–  Public Commitment
–  Feedback
–  Social Feedback
–  Goal Setting
–  Collaboration
–  Competition
–  Rewards
–  Incentives
–  Personalisation
Behavioural
Stage
Interventions
Desirability Information
Enabling
Context
Information, Rewards, Incentives
Can Do Goal Setting, Public Commitment,
Feedback
Buzz Feedback, Social Feedback
Invitation Promoting Collaboration
57!
Alberto Mendelzon Workshop 21th May 2018
Capturing and Categorising Behaviour
•  Goal
–  Automatic categorisation of users into behavioural stages following the
5 doors theory of behaviour change
•  Analysis Methodology
•  Based on questionnaire findings (212 participants)
–  “There is a moderate relationship between the type of user-generated
content and behaviour change stage”
1.  Manual inspection of the patterns describing each behavioural stage
2.  Feature engineering based on the identified patterns
3.  Supervised classification
Behavioural Stage Posts
Desirability I don’t understand why my energy bill is soooo expensive!
Enabling Context I am considering walking or using public transport at least
once a week
58!
Alberto Mendelzon Workshop 21th May 2018
Manual Inspection of Linguistic Patterns
•  Desirability
–  Negative sentiment (expressing personal frustration – anger / sadness)
–  URLs (generally associated with facts)
–  Questions (how can I? / what should I?)
•  Enabling Context
–  Neutral
–  Conditional sentences (if you do [..] then […])
–  Numeric facts [consumption/pollution] + URL
•  Can do
–  Neutral sentiment
–  Orders and suggestions (I/you should/must…)
•  Buzz
–  Positive sentiment (happiness / joy)
–  (I/we + present tense) I am doing / we are doing
•  Invitation
–  Positive sentiment (happy / cute)
–  [vocative] Friends, guys
–  Join me / tell us / with me
59!
Alberto Mendelzon Workshop 21th May 2018
Feature Engineering
•  Using an extension of the GATE NLP tools
–  Polarity (positive/negative/neutral)
–  Emotions
•  Positive (joy/surprise/good/happy/cheeky/cute)
•  Negative (anger/disgust/fear/sadness/bad/swearing)
–  Directives
•  Obligate (you must do) / imperative (do) / prohibitive (don’t do)
•  Jussive or imperative in the 3rd person (go me!)
•  Deliberative (shall / should we) / indirect deliberative (I wonder if)
•  Conditionals (if / then)
•  Questions (direct / indirect)
–  URLs (yes / no)
•  Indicates if the message points to external information
https://gate.ac.uk/
60!
Alberto Mendelzon Workshop 21th May 2018
Behaviour Classification Model
•  Multiple classifiers tested based on the sample of 2,610 annotated posts
•  Best performing classifier J48 decision tree (71.2% accuracy)
61!
Alberto Mendelzon Workshop 21th May 2018
Experiments
•  Analyse the behaviour of participants EH15 & COP21
•  Data Collection
–  Participants of EH15 & COP21. Up to 3,200 posts per user
•  Data Filtering
–  Identify for each user her posts related to climate change/sustainability
•  Use the term extraction tool ClimaTerm (GATE service)
–  Based on Gemet / Reegle / DBPedia
Movement Posts Users
EH15 56,531,349 20,847
COP21 48,751,220 17,127
Movement Posts Users
EH15 750,538 20,847
COP21 422,211 17,127
62!
Alberto Mendelzon Workshop 21th May 2018
62! Analysis of EH2015 and COP21
•  Categorise user behaviour in the months before/after
63!
Alberto Mendelzon Workshop 21th May 2018
63! Recommendations
•  A big part of a campaign’s effort should be concentrated on
providing messages with very concrete suggestions on climate
change actions
–  Most users are in the desirability stage: they want to change but they don’t
know how
•  There is a need to identify really engaged individuals and
community leaders and involve them more closely in the
campaigns
–  Few users in the invitation stage and most of them are organisations
–  For an invitation to be effective it is vital who issues the invitation
•  Efforts should be dedicated towards engaging in discussions and
providing direct feedback to users
–  Communication in these campaigns generally functions as broadcasting, or
one-way communication, from the organisations to the public
–  Frequent and focused feedback is an intervention strategy that can help
build self-efficacy and nudge the users in the direction of change
64!
Alberto Mendelzon Workshop 21th May 2018
Behaviour Analysis
(in an Enterprise Context)
Rowe, M., Fernandez, M., Angeletou, S., & Alani, H. (2013). Community analysis through
semantic rules and role composition derivation. Web Semantics: Science, Services and
Agents on the World Wide Web, 18(1), 31-47.
Rowe, Matthew, and Harith Alani. "What makes communities tick? community health
analysis using role compositions." Privacy, Security, Risk and Trust (PASSAT), 2012
International Conference on and 2012 International Confernece on Social Computing
(SocialCom). IEEE, 2012.
Rowe, M., Fernandez, M., Alani, H., Ronen, I., Hayes, C., & Karnstedt, M. (2012, June).
Behaviour analysis across different types of Enterprise Online Communities. In
Proceedings of the 4th Annual ACM Web Science Conference (pp. 255-264). ACM.
Some of the next slides from: https://www.slideshare.net/mattroweshow
65!
Alberto Mendelzon Workshop 21th May 2018
The Need for Interpretation
•  Online communities are dynamic behavioural ecosystems
–  Users in communities can be defined by their roles
•  i.e. Exhibiting similar collective behaviour
–  Prevalent behaviour can impact upon community members and health
•  Management of communities is helped by:
–  Understanding the relation between behaviour and health
•  How user behaviour changes are associated with health
•  Encouraging users to modify behaviour, in turn affecting health
–  e.g. content recommendation to specific users
–  Predicting health changes
•  Enables early decision making on community policy
•  Can we accurately and effectively detect positive and negative
changes in community health from its composition of behavioural
roles?
65
66!
Alberto Mendelzon Workshop 21th May 2018
SAP Community Network
•  Collection of SAP forums in which users discuss:
–  Software development
–  SAP Products
–  Usage of SAP tools
•  Points system for awarding best answers
–  Enables development of user reputation
•  Provided with a dataset covering 33 communities:
–  Spanning 2004 - 2011
–  95,200 threads
–  421,098 messages
•  78,690 were allocated points
–  32,942 users
020060010001400
PostCount
2004 2005 2006 2007 2008 2009 2010 2011
67!
Alberto Mendelzon Workshop 21th May 2018
Community Health Indicators
•  From the literature there is no single agreed measure of ‘community
health’
–  Multi-faceted nature: loyalty, participation, activity, social capital
–  Different communities and platforms look at different indicators
•  Indicator 1: Churn Rate (loyalty)
–  The proportion of users who participate in a community for the final time
•  Indicator 2: User Count (participation)
–  The number of participating users in the community
•  Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity)
–  The Proportion of seed posts (i.e. thread starters that receive a reply) to non-
seeds (i.e. no reply)
•  Indicator 4: Clustering Coefficient (social capital)
–  The average of users’ clustering coefficients within the largest strongly connected
component
68!
Alberto Mendelzon Workshop 21th May 2018
Measuring Role Compositions I:
Modelling and Measuring User Behaviour
•  According to existing literature, user behaviour can be defined
using 6 dimensions:
–  (Hautz et al., 2010), (Nolker and Zhou, 2005), (Zhu et al., 2009), (Zhu et
al., 2011)
–  Focus Dispersion
•  Measure: Forum entropy of the user
–  Engagement
•  Measure: Out-degree proportioned by potential maximal out-degree
–  Popularity
•  Measure: In-degree proportioned by potential maximal in-degree
–  Contribution
•  Measure: Proportion of thread replies created by the user
–  Initiation
•  Measure: Proportion of threads that were initiated by the user
–  Content Quality
•  Measure: Average points per post awarded to the user
69!
Alberto Mendelzon Workshop 21th May 2018
Measuring Role Compositions II:
Inferring Roles
•  1. Construct features for community users at a given time step
•  2. Derive bins using equal frequency binning
–  Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4
•  3. Use skeleton rule base to construct rules using bin levels
–  Popularity = low, Initiation = high -> roleA
–  Popularity < 0.5, Initiation > 0.4 -> roleA
•  4. Apply rules to infer user roles and community composition
•  5. Repeat 1-4 for following time steps
70!
Alberto Mendelzon Workshop 21th May 2018
Measuring Role Compositions III:
Mining Roles (Skeleton rule base compilation)
•  1. Select the tuning segment
•  2. Discover correlated behaviour dimensions
–  Removed Engagement and Contribution, kept Popularity (Pearson r > 0.75)
•  3. Cluster users into behavioural groups
•  4. Derive role labels for clusters
hod and number of clusters - we measure the cohesion and
aration of a given clustering as follows: For each clustering
rithm (Ψ) we iteratively increase the number of clusters
to use where 2 ≥ k ≥ 30. At each increment of k we
rd the silhouette coefficient produced by Ψ, this is defined
a given element (i) in a given cluster as:
si =
bi − ai
max(ai, bi)
(3)
Where ai denotes the average distance to all other items
he same cluster and bi is given by calculating the average
ance with all other items in each other distinct cluster and
taking the minimum distance. The value of si ranges
ween −1 and 1 where the former indicates a poor cluster-
where distinct items are grouped together and the latter
cates perfect cluster cohesion and separation. To derive
silhouette coefficient (s(Ψ(k)) for the entire clustering
take the average silhouette coefficient of all items. We
that the best clustering model and number of clusters to
is K-means with 11 clusters. We found that for smaller
ter numbers (k = [3, 8]) each clustering algorithm achieves
parable performance, however as we begin to increase the
ter numbers K-means improves while the two remaining
rithms produce worse cohesion and separation.
) Deriving Role Labels: Provided with the most cohesive
separated clustering of users we then derive role labels
each cluster. Role label derivation first involves inspecting
dimension distribution in each cluster and aligning the
ibution with a level mapping (i.e. low, mid, high). This
bles the conversion of continuous dimension ranges into
rete values which our rule-based approach requires in the
eton Rule Base. To perform this alignment we assess the
Fig. 2. Boxplots of the feature distributions in each of the 11 clusters.
Feature distributions are matched against the feature levels derived from equal-
frequency binning
TABLE II
MAPPING OF CLUSTER DIMENSIONS TO LEVELS. THE CLUSTERS ARE
ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY.
Cluster Dispersion Initiation Quality Popularity
1 L L L L
0 L M H L
6 L H M M
10 L H M H
4 L H H M
2,5 M H L H
8,9 M H H H
7 H H L H
3 H H H H
decision node, we measure the entropy of the dimensions and
their levels across the clusters, we then choose the dimension
with the largest entropy. This is defined formally as:
H(dim) = −
|levels|
level
p(level|dim) log p(level|dim) (4)
0 1 2 3 4 5 6 7 8 9
0.00.20.40.6
Cluster
Dispersion
0 1 2 3 4 5 6 7 8 9
0.000.010.020.030.04
Cluster
Initiation
0 1 2 3 4 5 6 7 8 9
0246810
Cluster
Quality
0 1 2 3 4 5 6 7 8 9
0.0000.0050.0100.0150.020
Cluster
Popularity
•  1 - Focussed Novice
•  2,5 - Mixed Novice
•  7 - Distributed Novice
•  3 - Distributed Expert
•  8,9 - Mixed Expert
•  0 - Focussed Expert Participant
•  4 - Focussed Expert Initiator
•  6 - Knowledgeable Member
•  10 - Knowledgeable Sink
71!
Alberto Mendelzon Workshop 21th May 2018
Health Indicator Regression
•  Managing online communities is helped by understanding
the relation between behaviour and health
−200 200 600
−2000100
Churn Rate
PC1
PC2
101
161
197198210226252
256
264
265
270 319
353
354
412
413
414
418
419
420 44470
50
56
−800 −400 0 400
−2000100
User Count
PC1
PC2
101
161197198210226
252
256
264 265270319
353
354
412
413414
418419
420
44
470
50
56
−400 0 200
−1000100200300
Seeds / Non−seeds Prop
PC1
PC2
101
161197
198210
226252256
264
265
270
319
353
354
412
413414
418
419
42044
470
50
56
−600 −200 200
−150−50050100
Clustering Coefficient
PC1
PC2
101
161
197
198
210
226
252
256
264
265
270319
353
354412413414
418
419
420
44 470
50 56
No global composition pattern for the entirety of SCN
•  Identified key differences as to ‘What makes Communities tick’
•  Decrease in Focussed Experts correlated with an increase in Seeds-to-Non-Seeds
!
72!
Alberto Mendelzon Workshop 21th May 2018
Sentiment Analysis
Saif, H., Fernandez, M., Kastler, L., & Alani, H. (2017). Sentiment lexicon adaptation with
context and semantics for the social web. Semantic Web, 8(5), 643-665.
Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment
analysis of Twitter. Information Processing & Management, 52(1), 5-19.
http://oro.open.ac.uk/42471/
Saif, H., Ortega, F. J., Fernández, M., & Cantador, I. (2016). Sentiment analysis in social
streams. In Emotions and Personality in Personalized Services (pp. 119-140). Springer,
Cham.
Saif, H., Fernandez, M., He, Y., & Alani, H. (2014, May). Senticircles for contextual and
conceptual semantic sentiment analysis of twitter. In European Semantic Web
Conference (pp. 83-98). Springer, Cham.
Saif, Hassan, Miriam Fernández, Yulan He, and Harith Alani. "On stopwords, filtering and
data sparsity for sentiment analysis of twitter." (2014): 810-817.
Some of the next slides from: https://www.slideshare.net/Staano/
73!
Alberto Mendelzon Workshop 21th May 2018
OutLine
o Definitions
o Brief History
o  Traditional Sentiment Analysis
o  Applications
o Sentiment Analysis on Social Media
o  Significance
o  Challenges
o Semantic Sentiment Analysis
o  Contextual Semantics
o  Conceptual Semantics
o Discussion
74!
Alberto Mendelzon Workshop 21th May 2018
Sentiment Analysis
•  Recent field of study that analyzes people’s attitudes towards
entities – individuals, organizations, products, services,
events - topics, and their attributes (Liu, 2012)
•  Interchangeably used along with Opinion Mining,
–  although they are technically different tasks
–  Opinion Mining: Extract the piece of text which represents the opinion
•  I have recently upgraded to iPhone 5. I am not happy with the screen size, but the
camera is absolutely amazing
–  Sentiment Analysis: Extract the polarity of the opinion
•  I am not happy with the screen size
•  The camera is absolutely amazing
75!
Alberto Mendelzon Workshop 21th May 2018
75
Why?
Because Opinion Matter!
What Does the public Think?
76!
Alberto Mendelzon Workshop 21th May 2018
77!
Alberto Mendelzon Workshop 21th May 2018 http://www.datameer.com/blog/
78!
Alberto Mendelzon Workshop 21th May 2018
79!
Alberto Mendelzon Workshop 21th May 2018
Sentiment
Analysis
Tasks
Ø  Subjectivity Detection
Ø  Polarity Detection
Ø  Sentiment Strength
Detection
Ø  Emotions Detection
Ø  Sentiment Summarization
Levels
Ø  Subjectivity Detection
Ø  Polarity Detection
Ø  Sentiment Strength
Detection
Ø  Emotions Detection
Data Types
Ø  Conventional Data
Ø  Microblogging Data
Approaches
Ø  Machine Learning
Ø  Lexicon-based
Ø  Hybrid
Sentiment Analysis
80!
Alberto Mendelzon Workshop 21th May 2018
Sentiment Analysis Tasks
•  Subjectivity Detection
–  Detect whether the text is objective or subjective
•  Polarity Detection
–  Detect whether the text is positive or negative
•  Sentiment Strength Detection
–  Detect the strength of the subjective text
•  Emotions Detection
–  Detect the human emotions and feelings expressed in text (e.g.,
“happiness”, “sadness”, “anger”)
81!
Alberto Mendelzon Workshop 21th May 2018
Sentiment Analysis Levels
Word/Entity/Aspect Level
•  Given a word w in a sentence s, decide whether this word is
opinionated (i.e., express sentiment) or not
Phrase-level (expression-level)
•  Given a multi-word expression e in a sentence s, the task is to
detect the sentiment orientation of e. (I’m very happy)
Sentence-level
•  Given a sentence s of multiple words and phrases, decide on the
sentiment orientation of s
Document-level
•  Given a document d, decide on the overall sentiment of d
82!
Alberto Mendelzon Workshop 21th May 2018
Sentiment Analysis Approaches
Lexicon-
Based
Approach
Machine Learning
Approach
83!
Alberto Mendelzon Workshop 21th May 2018
Machine Learning Approaches
•  Supervised Classifiers: Naïve Bayes, MaxEnt, SVM, J48, etc.
•  Unsupervised Classifiers: k-means, hierarchical clustering, HMM, SOM
•  Semi-Supervised Classifiers: Label propagation and graph-based models
84!
Alberto Mendelzon Workshop 21th May 2018
Lexicon-based Approaches
I had nightmares all night long last
night :(
Negative
Sentiment Lexicon
Text Processing
Algorithm
great
sad
down
wrong
horrible
mistake
love
good
MPQA, SentiWordNet, LIWC, etc.
!
Lexicon generation
Approaches
•  Manual
•  Dictionary-based
•  Corpus-based
!
85!
Alberto Mendelzon Workshop 21th May 2018
85! Data
Existing SA methods are
designed to function on
Formal Text, that is:
1.  Long enough
2.  Well-Structured
3.  Formal
Sentences
Social Media Text is
often
•  Short!
•  Noisy and messy
•  Have informal, and
ill-structured sentences
86!
Alberto Mendelzon Workshop 21th May 2018
Challenges to Traditional Approaches
Machine Learning Approaches
o  Classifier Training
o  Labelled Corpora
o  Labor Intensive Task
o  Domain-Specific
o  Re-Training with new domains
o  Data Sparsity
87!
Alberto Mendelzon Workshop 21th May 2018
87! Challenges to Traditional Approaches
•  Machine Learning
Approaches
o  Data Sparsity
o  Twitter data are more
sparse than conventional
Data (Saif et., 2012)
o  Singleton Words
constitute two-third of
the words in tweets!
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
OMD# HCR# STS5Gold# SemEval# WAB# GASP#
TF=1# TF>1#
88!
Alberto Mendelzon Workshop 21th May 2018
88! Challenges to Traditional Approaches
Lexicon-based Approaches
o  Sentiment Lexicons (e.g., MPQA, SentiWordNet)
o  Not tailored to Twitter noisy data:
o  Fixed Number of words
Sentiment Lexicon
great
sad
down
wrong
horriblemistake
love
goodgrt8lol
:)
:P
?
Need Lexicon Adaptation!
89!
Alberto Mendelzon Workshop 21th May 2018
I had a great pain in my lower back
this morning :(
Sentiment in practice is usually conveyed
through the latent semantics or meaning of
words in texts!
Ebola is spreading in Africa and ISIS in
Middle East!
Great Pain
Negative
ISIS -> Militant GroupEbola -> Virus/Disease
Negative Negative
Sentiment is Dynamic, domain-dependent, and…
90!
Alberto Mendelzon Workshop 21th May 2018
Semantic Sentiment Analysis (SentiCircles)
SentiCircles
•  Semantic Representation of words that captures their contextual sentiment
orientation and strength in tweets (Saif et al., 2014)
•  Captures Contextual & Conceptual Semantics of words
•  Does not rely on the structure of tweets
•  Provides lexicon-based sentiment analysis:
–  Tweet-level
–  Entity-level
Semantic sentiment analysis aims at extracting and using the underlying
semantics of words/aspects in identifying their sentiment orientation with
regards to their context in the text
!
91!
Alberto Mendelzon Workshop 21th May 2018
Distributional Semantic Hypothesis
Trojan Horse
Threat
Hack
Code
Malware
Program
Dangerous
Harm
Trojan Horse
Greek Tale
History
ClassWooden
Troy
“Words that occur in similar context tend to have similar meaning”
Wittgenstein (1953)
92!
Alberto Mendelzon Workshop 21th May 2018
Capturing Contextual Semantics
Term
(m)
C1 C2 Cn….
Context-Term Vector
Degree of Correlation
Prior SentimentSentimen
t Lexicon
3 Capturing and Representing Semantics for Sentiment Analysis
In the following we explain the SentiCircle approach and its use of contextual and con-
ceptual semantics. The main idea behind our SentiCircle approach is that the sentiment
of a term is not static, as in traditional lexicon-based approaches, but rather depends on
the context in which the term is used, i.e., it depends on its contextual semantics. We
define context as a textual corpus or a set of tweets.
To capture the contextual semantics of a term we consider its co-occurrence patterns
with other terms, as inspired by [27]. Following this principle, we compute the semantics
of a term m by considering the relations of m with all its context words (i.e., words that
occur with m in the same context). To compute the individual relation between the term
m and a context term ci we propose the use of the Term Degree of Correlation (TDOC)
metric. Inspired by the TF-IDF weighting scheme this metric is computed as:
TDOC(m, ci) = f(ci, m) ⇥ log
N
Nci
(1)
where f(ci, m) is the number of times ci occurs with m in tweets, N is the total number
of terms, and Nci is the total number of terms that occur with ci. In addition to each
TDOC computed between m and each context term ci, we also consider the Prior
Sentiment of ci, extracted from a sentiment lexicon. As with common practice, if this
term ci appears in the vicinity of a negation, its prior sentiment score is negated. The
negation words are collected from the General Inquirer under the NOTLW category.4
(1)
(2)
Trojan
Horse
threat attack
(3)
Contextual Sentiment
Strength
Contextual Sentiment
Orientation
Positive,
Negative
Neutral
[-1 (very negative)
+1 (very positive)]
93!
Alberto Mendelzon Workshop 21th May 2018
SentiCircles
The SentiCircle Approach
Term
(m)
C1
Degree of Correlation
Prior Sentiment
Trojan Horse
Context Terms
X = R * COS(θ) Y = R * SIN(θ)
Dangerou
s
X
ri
θi
xi
yi
PositiveVery Positive
Very Negative Negative
+1
-1
+1-1 Neutral
Region
ri = TDOC(Ci)
θi = Prior_Sentiment (Ci) * π
threat
destroy
Malicious
attac
k
easil
y
discoveruseful
fixC1Dangerous
Overall Contextual Sentiment (Senti-
Median)
where the geometric median is a point g = (xk, yk) in which its Euclidea
to all the points pi is minimum. We call the geometric median g the Senti-M
captures the sentiment (y-coordinate) and the sentiment strength (x-coordin
SentiCircle of a given term m.
Following the representation provided in Figure 1, the sentiment of the
dependent on whether the Senti-Median g lies inside the neutral region, t
quadrants, or the negative quadrants. Formally, given a Senti-Median gm o
the term-sentiment function L works as:
L(gm) =
8
<
:
negative if yg <
positive if yg > +
neutral if |yg|  & xg  0
where is the threshold that defines the Y-axis boundary of the neutral region
illustrates how this threshold is computed.
94!
Alberto Mendelzon Workshop 21th May 2018
Examples
95!
Alberto Mendelzon Workshop 21th May 2018
Tweet-Level Contextual Sentiment (I)
(1) The Median Method
Cycling under a heavy rain.. what a
#luck!
S-Median S-Median S-Median S-Median S-Median S-Median
The Median of Senti-Medians
96!
Alberto Mendelzon Workshop 21th May 2018
Tweet-Level Contextual Sentiment (II)
(2) The Pivot Method
like1
X
Y
r1
θ1
PositiveVery Positive
Very Negative Negative
new2
pj
r2
θ2
like1 new2 iPadj Wn
Sj1
Sj2
Tweet tk
...
ian Method: This method takes the median of all Senti-Medians, and this
all tweet terms to be equal. Each tweet ti 2 T is turned into a vector of Senti-
g = (g1, g2, ..., gn) of size n, where n is the number of terms that compose the
d gj is the Senti-Median of the SentiCircle associated with term mj. Equation
d to calculate the median point q of g, which we use to determine the overall
nt of tweet ti using Function 6.
t Method: This method favours some terms in a tweet over others, based on
mption that sentiment is often expressed towards one or more specific targets,
e refer to as “Pivot” terms. In the tweet example above, there are two pivot
iPhone” and “iPad” since the sentiment word “amazing” is used to describe
hem. Hence, the method works by (1) extracting all pivot terms in a tweet and;
mulating, for each sentiment label, the sentiment impact that each pivot term
from other terms. The overall sentiment of a tweet corresponds to the sentiment
h the highest sentiment impact. Opinion target identification is a challenging
is beyond the scope of our current study. For simplicity, we assume that the
ms are those having the POS tags: {Common Noun, Proper Noun, Pronoun} in
For each candidate pivot term, we build a SentiCircle from which the sentiment
hat a pivot term receives from all the other terms in a tweet can be computed.
y, the Pivot-Method seeks to find the sentiment ˆs that receives the maximum
nt impact within a tweet as:
ˆs = arg max
s2S
Hs(p) = arg max
s2S
Np
X
i
NwX
j
Hs(pi, wj) (7)
2 S = {Positive, Negative, Neutral} is the sentiment label, p is a vector of
I like my new iPad
97!
Alberto Mendelzon Workshop 21th May 2018
Performance
{Tweet-level sentiment analysis}
40.00	
50.00	
60.00	
70.00	
80.00	
MPQA-Lex	 Sen'WNet-Lex	 Sen'Circle	
Polarity	Detec-on	
Accuracy	 F-Measure	
62.00	
64.00	
66.00	
68.00	
70.00	
72.00	
74.00	
Accurcy	 F1	
Polarity	Detec-on	
Sen'Strength	 Sen'Circle	
{Entity-level sentiment analysis}
30
40
50
60
70
80
90
MPQA-Lex SentiWNet-Lex SentiStrength SentiCircle
Subjectivity Detection
Accurcy F1
65
70
75
80
85
90
MPQA SentiWordNet SentiStrength SentiCircle
Polarity Detection
Accurcy F1
+30-40%
+2-15%
+20%
+1/-1%
98!
Alberto Mendelzon Workshop 21th May 2018
Enriching SentiCircles with Conceptual Semantics
•  Semantic Extracted from external knowledge sources (e.g.,
ontologies and semantic networks).
ISIS is spreading in the Middle East like Cancer!
What a sad day, 4 doctors were lost to Ebola today!
Finally, I got my iPhone 6s, What a product!!
Jihadist
Militant
Virus
Apple-
Product
99!
Alberto Mendelzon Workshop 21th May 2018
Enriching SentiCircles with
Conceptual Semantics
Cycling under a heavy rain.. What a #luck!
Weather Condition
Wind
Snow
Humidity
68.00%
70.00%
72.00%
74.00%
76.00%
78.00%
Precision Recall F1
Unigrams POS Semantics
{Tweet-level sentiment analysis}
+4%
100!
Alberto Mendelzon Workshop 21th May 2018
•  Typical Sentiment Lexicons:
–  Context-insensitive sentiment
–  Fixed set of words
•  Lexicon Adaptation
–  Update the sentiment of words in a
given lexicon with respect to their
contextual in text.
•  Cold beer -> Positive
•  Great Pain -> Negative
Tweets
Extract
Contextual
Sentiment
Rule-based Lexicon Adaptation
Sentiment
Lexicon
Adapted Lexicon
Lexicon Adaptation with SentiCircles
Sentiment Lexicon Adaptation
101!
Alberto Mendelzon Workshop 21th May 2018
Words Found in the Lexicon 9.6%
Words flipped their sentiment orientation 33.82
Words changed their sentiment Strength 62.94
Words remained unchanged 3.24
New Opinionated words 21.37
Words in Thelwall-Lexicon were adapted based on
their context in three different datasets: OMD, HCR,
STS (Saif et al., 2013)
Adaptation Impact on Thelwall-Lexicon
Adaptation Impact! 66.29	
61.4	
69.29	
66.03	
55	
60	
65	
70	
Accuracy	 F1	
Original	Lexicon		 Adapted	Lexicon
102!
Alberto Mendelzon Workshop 21th May 2018
•  SentiCircles can effectively
captures the contextual
semantics and sentiment at the
corpus level
•  Provides Lexicon-based
(Unsupervised Sentiment
Analysis
•  Provides domain-specific
Sentiment Analysis
•  Low Complexity
–  Does not rely on the sentence
Structures
•  Not tailored to tweet-level /
sentence-level context
•  Sensitive to imbalanced
sentiment class distribution
•  Not very effective with small
Twitter datasets
Strengths and Limitations
103!
Alberto Mendelzon Workshop 21th May 2018
103
Take off Message
104!
Alberto Mendelzon Workshop 21th May 2018
Take off Message
•  Social Media data can be mined for multiple applications
•  It’s a great way to understand social phenomena at scale!
•  This research must be interdisciplinary
•  When using and studying social media we need to be very
aware of the problems (ethics / biases / misinformation)
•  A “pinch” of semantics goes a long way J
THX A LOT FOR LISTENING! J
105!
Alberto Mendelzon Workshop 21th May 2018
105
Let’s Download some
Twitter Data ☺
106!
Alberto Mendelzon Workshop 21th May 2018
Time to Play!
•  Automatic data collection generally relies on JSON APIs and
OAuth credentials. For example, for Twitter, you need to:
1.  Create a Twitter account (https://twitter.com).
2.  Obtain an OAuth access credentials (i.e., access token, access secret,
consumer key and consumer secret) (https://apps.twitter.com/app/new).
3.  Use Search API for collecting tweets (https://developer.twitter.com).
4.  Save Tweets in JSON or other format for later analysis.

Recommended

Social media mining hicss 46 part 1
Social media mining   hicss 46 part 1Social media mining   hicss 46 part 1
Social media mining hicss 46 part 1Dave King
 
Dialogue-Earth:-Mining-Social-Media
Dialogue-Earth:-Mining-Social-MediaDialogue-Earth:-Mining-Social-Media
Dialogue-Earth:-Mining-Social-MediaTom Masterman
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Paul Gilbreath
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism Alexander Howard
 
Data Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresData Journalism and the Remaking of Data Infrastructures
Data Journalism and the Remaking of Data InfrastructuresLiliana Bounegru
 
Co-Creating Misinformation Resilient Societies
Co-Creating Misinformation Resilient Societies Co-Creating Misinformation Resilient Societies
Co-Creating Misinformation Resilient Societies The Open University
 
SASIG Workshop on “Improving the digital landscape for our children”
SASIG Workshop on “Improving the digital landscape for our children”SASIG Workshop on “Improving the digital landscape for our children”
SASIG Workshop on “Improving the digital landscape for our children”The Open University
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Paul Bradshaw
 

More Related Content

What's hot

Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
 
New Media Activism Presentation
New Media Activism Presentation New Media Activism Presentation
New Media Activism Presentation NewMediaActivism
 
Social Media for the Government
Social Media for the GovernmentSocial Media for the Government
Social Media for the GovernmentKady Chiu
 
Social media - enterprise2.0 - course 2010 2011
Social media - enterprise2.0 - course 2010   2011Social media - enterprise2.0 - course 2010   2011
Social media - enterprise2.0 - course 2010 2011guillaume ereteo
 
How does news infomediation operate: the examples of Google and Facebook
How does news infomediation operate: the examples of Google and FacebookHow does news infomediation operate: the examples of Google and Facebook
How does news infomediation operate: the examples of Google and Facebooksmyrnaios
 
Social Media Analysis: Present and Future
Social Media Analysis: Present and FutureSocial Media Analysis: Present and Future
Social Media Analysis: Present and Futurematthewhurst
 
How does fakenews spread understanding pathways of disinformation spread thro...
How does fakenews spread understanding pathways of disinformation spread thro...How does fakenews spread understanding pathways of disinformation spread thro...
How does fakenews spread understanding pathways of disinformation spread thro...Araz Taeihagh
 
A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodssmyrnaios
 
Doing Social and Political Research in a Digital Age: An Introduction to Digi...
Doing Social and Political Research in a Digital Age: An Introduction to Digi...Doing Social and Political Research in a Digital Age: An Introduction to Digi...
Doing Social and Political Research in a Digital Age: An Introduction to Digi...Liliana Bounegru
 
Data Journalism - Introduction
Data Journalism - IntroductionData Journalism - Introduction
Data Journalism - IntroductionBahareh Heravi
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsAdam Papendieck
 
Redistributing journalism: Journalism as a data public and the politics of qu...
Redistributing journalism: Journalism as a data public and the politics of qu...Redistributing journalism: Journalism as a data public and the politics of qu...
Redistributing journalism: Journalism as a data public and the politics of qu...Liliana Bounegru
 
Social Justice & Black Twitter
Social Justice & Black TwitterSocial Justice & Black Twitter
Social Justice & Black TwitterAyodele Odubela
 
Dan Trottier
Dan TrottierDan Trottier
Dan Trottiercitasa
 
Mapping Issues with the Web: An Introduction to Digital Methods
Mapping Issues with the Web: An Introduction to Digital MethodsMapping Issues with the Web: An Introduction to Digital Methods
Mapping Issues with the Web: An Introduction to Digital MethodsJonathan Gray
 
Doing Digital Methods: Some Recent Highlights from Winter and Summer Schools
Doing Digital Methods: Some Recent Highlights from Winter and Summer SchoolsDoing Digital Methods: Some Recent Highlights from Winter and Summer Schools
Doing Digital Methods: Some Recent Highlights from Winter and Summer SchoolsLiliana Bounegru
 
Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesPetter Bae Brandtzæg
 

What's hot (20)

Data! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 yearsData! Action! Data journalism issues to watch in the next 10 years
Data! Action! Data journalism issues to watch in the next 10 years
 
New Media Activism Presentation
New Media Activism Presentation New Media Activism Presentation
New Media Activism Presentation
 
Social Media for the Government
Social Media for the GovernmentSocial Media for the Government
Social Media for the Government
 
How to get started with Data Journalism
How to get started with Data JournalismHow to get started with Data Journalism
How to get started with Data Journalism
 
Social media - enterprise2.0 - course 2010 2011
Social media - enterprise2.0 - course 2010   2011Social media - enterprise2.0 - course 2010   2011
Social media - enterprise2.0 - course 2010 2011
 
Data journalism Overview
Data journalism OverviewData journalism Overview
Data journalism Overview
 
How does news infomediation operate: the examples of Google and Facebook
How does news infomediation operate: the examples of Google and FacebookHow does news infomediation operate: the examples of Google and Facebook
How does news infomediation operate: the examples of Google and Facebook
 
Social Media Analysis: Present and Future
Social Media Analysis: Present and FutureSocial Media Analysis: Present and Future
Social Media Analysis: Present and Future
 
How does fakenews spread understanding pathways of disinformation spread thro...
How does fakenews spread understanding pathways of disinformation spread thro...How does fakenews spread understanding pathways of disinformation spread thro...
How does fakenews spread understanding pathways of disinformation spread thro...
 
A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methods
 
Doing Social and Political Research in a Digital Age: An Introduction to Digi...
Doing Social and Political Research in a Digital Age: An Introduction to Digi...Doing Social and Political Research in a Digital Age: An Introduction to Digi...
Doing Social and Political Research in a Digital Age: An Introduction to Digi...
 
Data journalism
Data journalism Data journalism
Data journalism
 
Data Journalism - Introduction
Data Journalism - IntroductionData Journalism - Introduction
Data Journalism - Introduction
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis Informatics
 
Redistributing journalism: Journalism as a data public and the politics of qu...
Redistributing journalism: Journalism as a data public and the politics of qu...Redistributing journalism: Journalism as a data public and the politics of qu...
Redistributing journalism: Journalism as a data public and the politics of qu...
 
Social Justice & Black Twitter
Social Justice & Black TwitterSocial Justice & Black Twitter
Social Justice & Black Twitter
 
Dan Trottier
Dan TrottierDan Trottier
Dan Trottier
 
Mapping Issues with the Web: An Introduction to Digital Methods
Mapping Issues with the Web: An Introduction to Digital MethodsMapping Issues with the Web: An Introduction to Digital Methods
Mapping Issues with the Web: An Introduction to Digital Methods
 
Doing Digital Methods: Some Recent Highlights from Winter and Summer Schools
Doing Digital Methods: Some Recent Highlights from Winter and Summer SchoolsDoing Digital Methods: Some Recent Highlights from Winter and Summer Schools
Doing Digital Methods: Some Recent Highlights from Winter and Summer Schools
 
Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sites
 

Similar to Introduction to Mining Social Media Data

CLICKNL DRIVE 2018 | 24 OCT | Design for Systemic Change
CLICKNL DRIVE 2018 | 24 OCT | Design for Systemic ChangeCLICKNL DRIVE 2018 | 24 OCT | Design for Systemic Change
CLICKNL DRIVE 2018 | 24 OCT | Design for Systemic ChangeCLICKNL
 
Altmetrics Day Workshop - Internet Librarian International 2014
Altmetrics Day Workshop - Internet Librarian International 2014Altmetrics Day Workshop - Internet Librarian International 2014
Altmetrics Day Workshop - Internet Librarian International 2014Andy Tattersall
 
What’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media WorkshopWhat’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media WorkshopWiLS
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
International Business and Culture 6
International Business and Culture 6International Business and Culture 6
International Business and Culture 6René Ceipek
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkHemant Purohit
 
How to maximize the impact of your research through kick-ass social media skills
How to maximize the impact of your research through kick-ass social media skillsHow to maximize the impact of your research through kick-ass social media skills
How to maximize the impact of your research through kick-ass social media skillsEsther De Smet
 
UCLA X469.21 - SPRING '18 WEEK 1
UCLA X469.21 - SPRING '18 WEEK 1UCLA X469.21 - SPRING '18 WEEK 1
UCLA X469.21 - SPRING '18 WEEK 1SocialMediaUCLA
 
Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)Miriam Fernandez
 
A social media revolution: Using social media to enhance teaching, student le...
A social media revolution: Using social media to enhance teaching, student le...A social media revolution: Using social media to enhance teaching, student le...
A social media revolution: Using social media to enhance teaching, student le...Sue Beckingham
 
Effective Whole Community Digital Communications Planning
Effective Whole Community Digital Communications PlanningEffective Whole Community Digital Communications Planning
Effective Whole Community Digital Communications PlanningCarol Spencer
 
VCCI social media guidelines and policies
VCCI social media guidelines and policiesVCCI social media guidelines and policies
VCCI social media guidelines and policiescatkenyon65
 
Strategies for Thriving in Social Media
Strategies for Thriving in Social MediaStrategies for Thriving in Social Media
Strategies for Thriving in Social MediaCharter School Capital
 
Social Media Is Evolving: Are You?
Social Media Is Evolving: Are You?Social Media Is Evolving: Are You?
Social Media Is Evolving: Are You?Sarah Best Strategy
 

Similar to Introduction to Mining Social Media Data (20)

CLICKNL DRIVE 2018 | 24 OCT | Design for Systemic Change
CLICKNL DRIVE 2018 | 24 OCT | Design for Systemic ChangeCLICKNL DRIVE 2018 | 24 OCT | Design for Systemic Change
CLICKNL DRIVE 2018 | 24 OCT | Design for Systemic Change
 
Altmetrics Day Workshop - Internet Librarian International 2014
Altmetrics Day Workshop - Internet Librarian International 2014Altmetrics Day Workshop - Internet Librarian International 2014
Altmetrics Day Workshop - Internet Librarian International 2014
 
What’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media WorkshopWhat’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media Workshop
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
International Business and Culture 6
International Business and Culture 6International Business and Culture 6
International Business and Culture 6
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
 
Workshop for Cambridgeshire Police
Workshop for Cambridgeshire Police Workshop for Cambridgeshire Police
Workshop for Cambridgeshire Police
 
How to maximize the impact of your research through kick-ass social media skills
How to maximize the impact of your research through kick-ass social media skillsHow to maximize the impact of your research through kick-ass social media skills
How to maximize the impact of your research through kick-ass social media skills
 
UCLA X469.21 - SPRING '18 WEEK 1
UCLA X469.21 - SPRING '18 WEEK 1UCLA X469.21 - SPRING '18 WEEK 1
UCLA X469.21 - SPRING '18 WEEK 1
 
Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)Biases in Social Media Research (NoBias EU project)
Biases in Social Media Research (NoBias EU project)
 
Cambridge Arts Network 30 July 2014
Cambridge Arts Network 30 July 2014Cambridge Arts Network 30 July 2014
Cambridge Arts Network 30 July 2014
 
A social media revolution: Using social media to enhance teaching, student le...
A social media revolution: Using social media to enhance teaching, student le...A social media revolution: Using social media to enhance teaching, student le...
A social media revolution: Using social media to enhance teaching, student le...
 
Effective Whole Community Digital Communications Planning
Effective Whole Community Digital Communications PlanningEffective Whole Community Digital Communications Planning
Effective Whole Community Digital Communications Planning
 
VCCI social media guidelines and policies
VCCI social media guidelines and policiesVCCI social media guidelines and policies
VCCI social media guidelines and policies
 
Social Media
Social MediaSocial Media
Social Media
 
Social Media in Learning
Social Media in LearningSocial Media in Learning
Social Media in Learning
 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Social Media Mining and Analytics
 
Strategies for Thriving in Social Media
Strategies for Thriving in Social MediaStrategies for Thriving in Social Media
Strategies for Thriving in Social Media
 
Social Media Is Evolving: Are You?
Social Media Is Evolving: Are You?Social Media Is Evolving: Are You?
Social Media Is Evolving: Are You?
 
ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1
 

More from Miriam Fernandez

Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Miriam Fernandez
 
Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5Miriam Fernandez
 
On the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal ChallengesOn the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal ChallengesMiriam Fernandez
 
Online radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsOnline radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsMiriam Fernandez
 
Mining Social Media Data For Policing
Mining Social Media Data For PolicingMining Social Media Data For Policing
Mining Social Media Data For PolicingMiriam Fernandez
 
Online Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsOnline Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsMiriam Fernandez
 
Slides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptxSlides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptxMiriam Fernandez
 
Artificial Intelligence for Policing
Artificial Intelligence for PolicingArtificial Intelligence for Policing
Artificial Intelligence for PolicingMiriam Fernandez
 
OUSocial OUSocMed conference
OUSocial OUSocMed conference OUSocial OUSocMed conference
OUSocial OUSocMed conference Miriam Fernandez
 
On the use of social media for evidence-based policing
On the use of social media for evidence-based policingOn the use of social media for evidence-based policing
On the use of social media for evidence-based policingMiriam Fernandez
 
SocInfo2014 CityLabs Workshop
SocInfo2014 CityLabs WorkshopSocInfo2014 CityLabs Workshop
SocInfo2014 CityLabs WorkshopMiriam Fernandez
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...Miriam Fernandez
 
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookMiriam Fernandez
 
Wm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-finalWm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-finalMiriam Fernandez
 
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...Miriam Fernandez
 

More from Miriam Fernandez (19)

Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)
 
Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5Vision track october_2020_fernandez_v5
Vision track october_2020_fernandez_v5
 
On the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal ChallengesOn the Application of Social Data Science to Address Societal Challenges
On the Application of Social Data Science to Address Societal Challenges
 
Online radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsOnline radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directions
 
Mining Social Media Data For Policing
Mining Social Media Data For PolicingMining Social Media Data For Policing
Mining Social Media Data For Policing
 
Online Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsOnline Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future Directions
 
Slides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptxSlides 28-feb-2018-v2.pptx
Slides 28-feb-2018-v2.pptx
 
Artificial Intelligence for Policing
Artificial Intelligence for PolicingArtificial Intelligence for Policing
Artificial Intelligence for Policing
 
OUSocial OUSocMed conference
OUSocial OUSocMed conference OUSocial OUSocMed conference
OUSocial OUSocMed conference
 
On the use of social media for evidence-based policing
On the use of social media for evidence-based policingOn the use of social media for evidence-based policing
On the use of social media for evidence-based policing
 
SocInfo2014 CityLabs Workshop
SocInfo2014 CityLabs WorkshopSocInfo2014 CityLabs Workshop
SocInfo2014 CityLabs Workshop
 
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
ECSM2014: Using Social Media To Inform Policy Making: To whom are we listenin...
 
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
 
ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4
 
ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3
 
ESWC 2014 Tutorial part 2
ESWC 2014 Tutorial part 2ESWC 2014 Tutorial part 2
ESWC 2014 Tutorial part 2
 
Wm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-finalWm unit1.6-slides-semantic web-final
Wm unit1.6-slides-semantic web-final
 
CAEPIA 2011
CAEPIA 2011CAEPIA 2011
CAEPIA 2011
 
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
Iswc 2011: Linking Data Across Universities: An Integrated Video Lectures Dat...
 

Recently uploaded

Lecture 9 .Parasympathetic agents.b pharmacy second year
Lecture 9 .Parasympathetic agents.b pharmacy second yearLecture 9 .Parasympathetic agents.b pharmacy second year
Lecture 9 .Parasympathetic agents.b pharmacy second yearmanjusha kareppa
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxperiyar arts college
 
A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024Richard Gill
 
Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Ahmed Metwaly
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsPeter Coles
 
the menstrual cycle in female reproductive system
the menstrual cycle in female reproductive systemthe menstrual cycle in female reproductive system
the menstrual cycle in female reproductive systemGilmeTripole1
 
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.Silpa Selvaraj
 
Ento-322, Agrochemicals for agriculture usee
Ento-322, Agrochemicals for agriculture useeEnto-322, Agrochemicals for agriculture usee
Ento-322, Agrochemicals for agriculture useeDrAnita Sharma
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterSérgio Sacani
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET
 
Lecture 10 .Parasympathetic agents- Indirect acting.pptx
Lecture 10 .Parasympathetic agents- Indirect acting.pptxLecture 10 .Parasympathetic agents- Indirect acting.pptx
Lecture 10 .Parasympathetic agents- Indirect acting.pptxmanjusha kareppa
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisLinaMarcelaCharrisRa
 
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...Uzay Emir
 
nanotechnology in forensic science......
nanotechnology in forensic science......nanotechnology in forensic science......
nanotechnology in forensic science......Madona Mathew
 
Presentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularPresentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularmarianaarangop
 
Differences between syrups and elixirs .pptx
Differences between  syrups and elixirs .pptxDifferences between  syrups and elixirs .pptx
Differences between syrups and elixirs .pptxushakiranmai4
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)Rachana Choudhary
 
electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.Silpa Selvaraj
 
American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603SOCIEDAD JULIO GARAVITO
 

Recently uploaded (20)

Lecture 9 .Parasympathetic agents.b pharmacy second year
Lecture 9 .Parasympathetic agents.b pharmacy second yearLecture 9 .Parasympathetic agents.b pharmacy second year
Lecture 9 .Parasympathetic agents.b pharmacy second year
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptx
 
A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024
 
Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)Introduction to Chromatography (Column chromatography)
Introduction to Chromatography (Column chromatography)
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
 
the menstrual cycle in female reproductive system
the menstrual cycle in female reproductive systemthe menstrual cycle in female reproductive system
the menstrual cycle in female reproductive system
 
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.
 
Ento-322, Agrochemicals for agriculture usee
Ento-322, Agrochemicals for agriculture useeEnto-322, Agrochemicals for agriculture usee
Ento-322, Agrochemicals for agriculture usee
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma cluster
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
 
REGULATION OF METABOLISM IN PLANTS AND THE DIFFERENT MECHANISMS
REGULATION OF METABOLISM IN PLANTS  AND THE DIFFERENT MECHANISMSREGULATION OF METABOLISM IN PLANTS  AND THE DIFFERENT MECHANISMS
REGULATION OF METABOLISM IN PLANTS AND THE DIFFERENT MECHANISMS
 
Lecture 10 .Parasympathetic agents- Indirect acting.pptx
Lecture 10 .Parasympathetic agents- Indirect acting.pptxLecture 10 .Parasympathetic agents- Indirect acting.pptx
Lecture 10 .Parasympathetic agents- Indirect acting.pptx
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina Charris
 
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
 
nanotechnology in forensic science......
nanotechnology in forensic science......nanotechnology in forensic science......
nanotechnology in forensic science......
 
Presentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularPresentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecular
 
Differences between syrups and elixirs .pptx
Differences between  syrups and elixirs .pptxDifferences between  syrups and elixirs .pptx
Differences between syrups and elixirs .pptx
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)
 
electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.
 
American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603American Eclipse A Nation’s Epic Race to Catch the_240225_095603
American Eclipse A Nation’s Epic Race to Catch the_240225_095603
 

Introduction to Mining Social Media Data

  • 1. 1! Alberto Mendelzon Workshop 21th May 2018 1! Introduction to Mining Social Media Data Miriam Fernandez Knowledge Media Institute Open University, UK @miriam_fs @miriamfs Credit to all these fantastic people!!
  • 2. 2! Alberto Mendelzon Workshop 21th May 2018 2! Who we are? 2
  • 3. 3! Alberto Mendelzon Workshop 21th May 2018 3! Before we start… •  1.- This is an after lunch session… –  Hope you took the necessary precautions! •  2.- It is an introductory tutorial –  If you were expecting something very complex this is not for you, go out and enjoy the sun J •  3.- I hate talking alone for long periods of time –  Please ask or discuss anything you want at any point! •  4.- hands-on excercises available –  Fantastic tutorial @TheWebConf by some of my colleagues! J https://github.com/evhart/smasac-tutorial/blob/ master/README.md (jupyter notebooks)!
  • 4. 4! Alberto Mendelzon Workshop 21th May 2018 4 Understanding Social Media
  • 5. 5! Alberto Mendelzon Workshop 21th May 2018 5! Most Used Social Media Platforms Source: https://techcrunch.com/2017/06/27/facebook-2-billion-users/
  • 6. 6! Alberto Mendelzon Workshop 21th May 2018 6! Not the Only Ones Smaller and less famous (open and closed) communities addressing particular geographic regions, specific user groups or niche interests thrive on the Web!
  • 7. 7! Alberto Mendelzon Workshop 21th May 2018 A World-wide Phenomenon Number of social network users worldwide in billions! Source: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
  • 8. 8! Alberto Mendelzon Workshop 21th May 2018 Number of social network users in selected countries (in millions)! Source: https://www.statista.com/statistics/278341/number-of-social-network-users-in-selected-countries/
  • 9. 9! Alberto Mendelzon Workshop 21th May 2018 Full of Challenges
  • 10. 10! Alberto Mendelzon Workshop 21th May 2018 Mining Social Media data, for What? Trivalent: http://trivalent-project.eu/ COMRADES: https://www.comrades-project.eu/ DecarboNet: https://www.decarbonet.eu/ Sense4us: http://www.sense4us.eu/ ROBUST: http://www.robust-project.eu/ OUSocial: http://oro.open.ac.uk/40883/1/ousocial2-demo.pdf Some of the next slides from: https://www.slideshare.net/halani
  • 11. 11! Alberto Mendelzon Workshop 21th May 2018 Studying social phenomena at scale!
  • 12. 12! Alberto Mendelzon Workshop 21th May 2018 Social Semantic Statistical Analysis
  • 13. 13! Alberto Mendelzon Workshop 21th May 2018 Businesses •  Many businesses provide online communities to: –  Increase customer loyalty –  Raise brand awareness –  Spread word-of-mouth –  Facilitate idea generation •  Online communities incur significant investment in terms of: –  Money spent on hosting and bandwidth –  Time and effort for maintenance •  Community managers monitor community ‘health’ to: –  Ensure longevity –  Enable value generation •  However, the notion of ‘health’ is hard to pin down http://www.robust- project.eu/
  • 14. 14! Alberto Mendelzon Workshop 21th May 2018 Businesses Monitoring of evolution of community activities and level of contributions in SAP Community Networks – SCN
  • 15. 15! Alberto Mendelzon Workshop 21th May 2018 Reputation Fish Tank https://www.youtube.com/watch?time_continue=57&v=KXRzdrDDt_8!
  • 16. 16! Alberto Mendelzon Workshop 21th May 2018 Active OU communities on Facebook
  • 17. 17! Alberto Mendelzon Workshop 21th May 2018 •  How ac've and engaged the course group is? •  How is sen'ment towards the course evolving? •  Are the leaders of the group providing posi've/ nega've comments? •  What topics are emerging? •  Is the group flourishing or diminishing? •  Do students get the answers and support they need or not? DEMO Education
  • 18. 18! Alberto Mendelzon Workshop 21th May 2018 OUAnalyse •  Social media data vs. VLE data to increase retention Names https://analyse.kmi.open.ac.uk/
  • 21. 21! Alberto Mendelzon Workshop 21th May 2018 Automatic Categorisation of Social Media Accounts •  Objective: –  Provide automatic identification of the main actors talking about policy in social media –  Allow policy researchers to concentrate on the opinions of citizens vs. commercial organizations •  Approach Twitter Data Data Collection Feature Engineering User Classification Person Company NGO MP News & Media
  • 22. 22! Alberto Mendelzon Workshop 21th May 2018 Policing Olson’s psychological theory of luring communication (LCT) Grooming data •  Classification results: –  Trust development: 79% P, 82% R, 81% F1 –  Grooming stage: 88% P, 89% R, 88% F1 –  Physical approach: 87% P, 89% R, 88% F1
  • 23. 23! Alberto Mendelzon Workshop 21th May 2018 Energy
  • 24. 24! Alberto Mendelzon Workshop 21th May 2018 Disaster Management 177 million tweets were posted in a single day during the 2011 Japan earthquake Boston Marathon Bombing broke on Twitter. On the news, 3 hours later!
  • 25. 25! Alberto Mendelzon Workshop 21th May 2018 Ushahidi
  • 26. 26! Alberto Mendelzon Workshop 21th May 2018 •  Crisis-related event detection is often divided into three main tasks [Olteanu et al. 2015]: Crisis-based Event Detection Tasks Task 1. Crisis vs. non-Crisis Related Messages Task 2. Type of Crisis Task 3. Type of Information Differentiate those posts that are related to a crisis situation vs. those posts that are not Identify the different types of crises the message is related to Differentiate those posts that are related to a crisis situation vs. those posts that are not Shooting, Explosion, Building Collapse, Fires, Floods, Meteorite Fall, etc. Affected Individuals, Infrastructures and Utilities, Donations and Volunteer, Caution and Advice, etc. Granularity
  • 27. 27! Alberto Mendelzon Workshop 21th May 2018 Disaster Management https://evhart.github.io/crees/
  • 28. 28! Alberto Mendelzon Workshop 21th May 2018 Be aware of the problems! Fernandez, M., and Alani H. "Online Misinformation: Challenges and Future Directions." Companion of the The Web Conference 2018. http://oro.open.ac.uk/53734/ https://kmitd.github.io/recoding-black-mirror/ http://www.aolteanu.com/SocialDataLimitsTutorial/
  • 29. 29! Alberto Mendelzon Workshop 21th May 2018 29! Strong need of Ethics!
  • 30. 30! Alberto Mendelzon Workshop 21th May 2018 30! Re-coding Black Mirrors
  • 31. 31! Alberto Mendelzon Workshop 21th May 2018 31! Bias on the Web at all levels! http://www.aolteanu.com/SocialDataLimitsTutorial/
  • 32. 32! Alberto Mendelzon Workshop 21th May 2018 Some considerations when collecting data •  Automatic access to social media data can be restricted in different ways: –  Public / Non-public data: Most social media websites do not allow access to the information posted unless reading access is given explicitly by the information creator. –  Query restrictions: Data access can be limited by API restrictions (e.g., rate limiting, query allowance). –  Data Sampling: High velocity data is sometimes sampled by social media companies. As result, it is only possible to retrieve a portion of the relevant information. –  Query Filtering: Often data is retrieved using query parameters (e.g., keywords, geolocation, etc.). Missing information / biased information
  • 33. 33! Alberto Mendelzon Workshop 21th May 2018 Some considerations when analysing data –  User type may vary (e.g., news organisations, journalist, companies, government, NGOs, etc.) –  Populations may be biased (e.g., not all distributions of ages/ gender / political views / etc.) –  Type of information shared may vary: (e.g., during a disaster you may have messages about: affected individuals, caution and advice, donation or volunteering, message of support, etc.) –  Type of content shared may vary (e.g., text, images, videos, links). –  Target audience may vary (e.g., general public, other organisation, followers, friends/family). –  Social media platforms to communicate the message may vary, or more than one may be in use (e.g., Facebook, Twitter, etc.)
  • 34. 34! Alberto Mendelzon Workshop 21th May 2018 34! https://shorensteincenter.org/information-disorder-framework-for-research-and-policymaking/
  • 35. 35! Alberto Mendelzon Workshop 21th May 2018 35! Types of Misinformation and Disinformation 7 Types of Mis- and Dis-information (Credit: Claire Wardle, First Draft)
  • 36. 36! Alberto Mendelzon Workshop 21th May 2018 36! Affecting the decision making processes in many domains
  • 37. 37! Alberto Mendelzon Workshop 21th May 2018 37! Dimensions of Combating Online Misinformation •  Misinformation content detection –  Are misinformation content and sources automatically identified? Are streams of information automatically monitored? Is relevant corrective information identified as well? •  Misinformation dynamics –  Are patterns of misinformation flow identified and predicted? Is demographic and behavioural information considered to understand and predict misinformation dynamics? •  Content Validation –  Is misinformation validated and fact checked? Are the users involved in the content validation process? •  Misinformation management –  Are citizens’ perceptions and behaviour with regards to processing and sharing misinformation studied and monitored? Are intervention strategies put in place to handle the effects of misinformation?
  • 38. 38! Alberto Mendelzon Workshop 21th May 2018 38! Misinformation Content Detection Network & propagation patternsInformation source Content Text/images/videos Context Lists of misleading sites specific features (hashtags, mentions) http://www.opensources.co/ Misinformation?
  • 39. 39! Alberto Mendelzon Workshop 21th May 2018 39! Misinformation Dynamics Low content diversity and strong social reinforcement Homophily Polarisation Algorithmic ranking/ personalisation Social bubbles •  Misinform ation spreads faster and more widely across the network •  Misinformation can be attributed to/ spread by bots & crowdturfing •  Users that use more social words and affection are more susceptible to interact with bots •  Extroverts are more prone to share misinformation •  Users tend to select and share content based on homogeneity (echo chambers). An effect exacerbated by ranking and personality algorithms •  In social media environments, where users are influenced by high information load and finite attention, low quality information is likely to go viral. •  Different types of misinformation spread differently. Scientific news have a higher level of diffusion but decay faster. Conspiracy theories are spread slower over longer time periods •  Even when denied, the rumour cascades continues to propagate
  • 40. 40! Alberto Mendelzon Workshop 21th May 2018 40! Content Validation •  Full Fact, UK •  Snopes and Root Claim, US •  FactCheckNI, Northern Ireland •  Pagella Politica, Italy COMPUTATIONAL FACT CHECKER Automatically extract claims and validates them against a variety of information sources Knowledge Bases DBs of manually assessed facts by experts Crowdsourcing for annotation and/or verification Truth Teller Whether a claim is accepted by an individual is strongly influenced by the individual’s believe system (confirmation bias / motivated reasoning)
  • 41. 41! Alberto Mendelzon Workshop 21th May 2018 41! Misinformation Management Simply presenting people with corrective information is likely to fail in changing their salient beliefs and opinions, or may, even, reinforce them Provide an explanation rather than a simple refute Expose the user to related but disconfirming stories Revealing the demographic similarity of the opposing group Expose the users to “small doses” of misinformation Combatting misinformation Facts Early detection of malicious accounts Use of ranking and selection strategies based on corrective information
  • 42. 42! Alberto Mendelzon Workshop 21th May 2018 42! Comparison of Relevant Platforms
  • 43. 43! Alberto Mendelzon Workshop 21th May 2018 43! Limitations •  Misinformation content detection –  Do not provide rationale or explanation of their decisions –  Disengage users by regarding them as passive consumers rather than as active co-creators and detectors of misinformation •  Misinformation dynamics –  Do not consider the typology and topology of the different networks –  Do not take into account how the misinformation-handling behaviour of users influences the spread of misinformation •  Content Validation –  Not able to cope with the high volume of misinformation generated online –  Often disconnected from where the users tend to read, debate and share misinformation. •  Misinformation management –  Tend to focus on the technical and not on the human aspects of the problem (i.e., motivations and behaviours of the users when generating and spreading misinformation)
  • 44. 44! Alberto Mendelzon Workshop 21th May 2018 44! Research Directions •  User Involvement –  Participation of all stakeholders, including end users, social scientists, computer scientists, educators, etc., in the co-design of their functions, user interfaces, and delivery methods •  Misinformation Dynamics –  Study how platform-specific and network-specific features influence the dynamics of misinformation •  Content Validation –  Embed fact checkers into the environments where users tend to read, debate, and share misinformation (plugins) •  Misinformation Management –  Understanding user behaviour towards misinformation, what opinions users form about it, and how these opinions evolve over time, are key to successfully manage the impact of misinformation. –  Technology can be used to test the effectiveness of various misinformation management policies and techniques, as well as to deploy them at scale.
  • 45. 45! Alberto Mendelzon Workshop 21th May 2018 Modeling Social Media Data SIOC: http://sioc-project.org/ M Fernandez, A Scharl, K Bontcheva, H Alani. User Profile Modelling in Online Communities. SWCS’14 Third International Workshop on Semantic Web Collaborative Spaces. ISWC 2014 http://oro.open.ac.uk/41395/
  • 46. 46! Alberto Mendelzon Workshop 21th May 2018 Data Integration •  Social Networking Sites are like data silos –  Many isolated communities of users with their data •  The same user can participate in different social networks –  Miriam.fs / miriamfs / mfs •  The same topic can be discussed in different social networks –  Need ways to connect them •  To develop portable analysis models •  To allow users to access their data uniformly across SNS •  To allow automatic data portability from one SNS to another one Source: J.Breslin: The Social Semantic Web: An Introduction http://www.slideshare.net/Cloud/the-social-semantic-web-an-introduction
  • 47. 47! Alberto Mendelzon Workshop 21th May 2018 Users / Content / Collaborative Environment Demographic characteristics •  Birthday •  Location •  Sex Preferences Social Network Collaborative Environment Behaviour Personality Content The User Needs SUM SUM MESHOUBO SIOC FOAF Schema.org Microformats SemSNA SIOC OPO Schema.org FOAF MESH MESH Domain of Discussion PAO
  • 48. 48! Alberto Mendelzon Workshop 21th May 2018 Using SIOC to Model Twitter Data sioc:reply_of/ sioc:has_reply sioct: Microblog Post Tweet URL sioc:content Tweet Text dcterms:created Tweet creation time sioc:has_container/ sioc:container_of sioct: Microblog sioc:has_creator/ sioc:creator_of sioc:UserAccount sioc:name Screen name sioc:has_space/ sioc:space_of sioc:Site Twitter homepage sioc:topic sioct:Tag sioc:name Extracted hashtag sioc:links_to Extracted link sioc:mentions sioc:follows sioc:subscriber_of/ sioc:has_subscriber, sioc:isPartOf/ sioc:hasPart sioc:has_owner/ sioc:owner_of geo:long Tweet Longt. geo:lat Tweet Lat. gn:Feature sioc:about ... geo:Point geo:location dcterms:created Account creation time sioc:note Account description sioc:avatar Avatar URL User Twitter homepage User ID dcterms:title User name sioc:forwarded_by sioc:Container Twitter list ID sioc:addressed_to
  • 49. 49! Alberto Mendelzon Workshop 21th May 2018 49 Mining Social Media Data, How?
  • 50. 50! Alberto Mendelzon Workshop 21th May 2018 Analysis •  Behaviour Analysis •  Sentiment Analysis
  • 51. 51! Alberto Mendelzon Workshop 21th May 2018 Behaviour Analysis (in a climate change context) Fernandez, M., Piccolo, L., Alani, H., Maynard, D., Meili, C., & Wippoo, M. (2017). Pro- Environmental Campaigns via Social Media: Analysing Awareness and Behaviour Patterns. The Journal of Web Science, 3(1). http://www.webscience-journal.net/webscience/article/view/44/30 Fernández, M., Burel, G., Alani, H., Piccolo, L. S. G., Meili, C., & Hess, R. (2015). Analysing engagement towards the 2014 earth hour campaign in Twitter. http://oro.open.ac.uk/43621/1/ENVINFO2015_v12.pdf
  • 52. 52! Alberto Mendelzon Workshop 21th May 2018 52! Problem •  Individual behaviour change is a central strategy to mitigate climate change •  However, public engagement is still limited
  • 53. 53! Alberto Mendelzon Workshop 21th May 2018 53! Problem •  Pro-environmental campaigns, particularly via social media •  Unclear how existing theories and studies of behaviour change can be applied to practical settings, particular social media campaigns, to better target and inform users
  • 54. 54! Alberto Mendelzon Workshop 21th May 2018 54! Research Questions •  RQ1: How can we translate theories of behaviour change into computational methods to enable the automatic identification of behaviour? •  RQ2: How can the combination of theoretical perspectives and the automatic identification of behaviour help us to develop effective social media communication strategies for enabling behaviour change?
  • 55. 55! Alberto Mendelzon Workshop 21th May 2018 55! Literature Review (I) •  Behaviour Change –  Socio-psychological models of behaviour (mainly at individual level) –  Theories of change (5 Doors Theory [Robinson])
  • 56. 56! Alberto Mendelzon Workshop 21th May 2018 56! Literature Review (II) •  Intervention Strategies –  Information –  Discussions –  Public Commitment –  Feedback –  Social Feedback –  Goal Setting –  Collaboration –  Competition –  Rewards –  Incentives –  Personalisation Behavioural Stage Interventions Desirability Information Enabling Context Information, Rewards, Incentives Can Do Goal Setting, Public Commitment, Feedback Buzz Feedback, Social Feedback Invitation Promoting Collaboration
  • 57. 57! Alberto Mendelzon Workshop 21th May 2018 Capturing and Categorising Behaviour •  Goal –  Automatic categorisation of users into behavioural stages following the 5 doors theory of behaviour change •  Analysis Methodology •  Based on questionnaire findings (212 participants) –  “There is a moderate relationship between the type of user-generated content and behaviour change stage” 1.  Manual inspection of the patterns describing each behavioural stage 2.  Feature engineering based on the identified patterns 3.  Supervised classification Behavioural Stage Posts Desirability I don’t understand why my energy bill is soooo expensive! Enabling Context I am considering walking or using public transport at least once a week
  • 58. 58! Alberto Mendelzon Workshop 21th May 2018 Manual Inspection of Linguistic Patterns •  Desirability –  Negative sentiment (expressing personal frustration – anger / sadness) –  URLs (generally associated with facts) –  Questions (how can I? / what should I?) •  Enabling Context –  Neutral –  Conditional sentences (if you do [..] then […]) –  Numeric facts [consumption/pollution] + URL •  Can do –  Neutral sentiment –  Orders and suggestions (I/you should/must…) •  Buzz –  Positive sentiment (happiness / joy) –  (I/we + present tense) I am doing / we are doing •  Invitation –  Positive sentiment (happy / cute) –  [vocative] Friends, guys –  Join me / tell us / with me
  • 59. 59! Alberto Mendelzon Workshop 21th May 2018 Feature Engineering •  Using an extension of the GATE NLP tools –  Polarity (positive/negative/neutral) –  Emotions •  Positive (joy/surprise/good/happy/cheeky/cute) •  Negative (anger/disgust/fear/sadness/bad/swearing) –  Directives •  Obligate (you must do) / imperative (do) / prohibitive (don’t do) •  Jussive or imperative in the 3rd person (go me!) •  Deliberative (shall / should we) / indirect deliberative (I wonder if) •  Conditionals (if / then) •  Questions (direct / indirect) –  URLs (yes / no) •  Indicates if the message points to external information https://gate.ac.uk/
  • 60. 60! Alberto Mendelzon Workshop 21th May 2018 Behaviour Classification Model •  Multiple classifiers tested based on the sample of 2,610 annotated posts •  Best performing classifier J48 decision tree (71.2% accuracy)
  • 61. 61! Alberto Mendelzon Workshop 21th May 2018 Experiments •  Analyse the behaviour of participants EH15 & COP21 •  Data Collection –  Participants of EH15 & COP21. Up to 3,200 posts per user •  Data Filtering –  Identify for each user her posts related to climate change/sustainability •  Use the term extraction tool ClimaTerm (GATE service) –  Based on Gemet / Reegle / DBPedia Movement Posts Users EH15 56,531,349 20,847 COP21 48,751,220 17,127 Movement Posts Users EH15 750,538 20,847 COP21 422,211 17,127
  • 62. 62! Alberto Mendelzon Workshop 21th May 2018 62! Analysis of EH2015 and COP21 •  Categorise user behaviour in the months before/after
  • 63. 63! Alberto Mendelzon Workshop 21th May 2018 63! Recommendations •  A big part of a campaign’s effort should be concentrated on providing messages with very concrete suggestions on climate change actions –  Most users are in the desirability stage: they want to change but they don’t know how •  There is a need to identify really engaged individuals and community leaders and involve them more closely in the campaigns –  Few users in the invitation stage and most of them are organisations –  For an invitation to be effective it is vital who issues the invitation •  Efforts should be dedicated towards engaging in discussions and providing direct feedback to users –  Communication in these campaigns generally functions as broadcasting, or one-way communication, from the organisations to the public –  Frequent and focused feedback is an intervention strategy that can help build self-efficacy and nudge the users in the direction of change
  • 64. 64! Alberto Mendelzon Workshop 21th May 2018 Behaviour Analysis (in an Enterprise Context) Rowe, M., Fernandez, M., Angeletou, S., & Alani, H. (2013). Community analysis through semantic rules and role composition derivation. Web Semantics: Science, Services and Agents on the World Wide Web, 18(1), 31-47. Rowe, Matthew, and Harith Alani. "What makes communities tick? community health analysis using role compositions." Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom). IEEE, 2012. Rowe, M., Fernandez, M., Alani, H., Ronen, I., Hayes, C., & Karnstedt, M. (2012, June). Behaviour analysis across different types of Enterprise Online Communities. In Proceedings of the 4th Annual ACM Web Science Conference (pp. 255-264). ACM. Some of the next slides from: https://www.slideshare.net/mattroweshow
  • 65. 65! Alberto Mendelzon Workshop 21th May 2018 The Need for Interpretation •  Online communities are dynamic behavioural ecosystems –  Users in communities can be defined by their roles •  i.e. Exhibiting similar collective behaviour –  Prevalent behaviour can impact upon community members and health •  Management of communities is helped by: –  Understanding the relation between behaviour and health •  How user behaviour changes are associated with health •  Encouraging users to modify behaviour, in turn affecting health –  e.g. content recommendation to specific users –  Predicting health changes •  Enables early decision making on community policy •  Can we accurately and effectively detect positive and negative changes in community health from its composition of behavioural roles? 65
  • 66. 66! Alberto Mendelzon Workshop 21th May 2018 SAP Community Network •  Collection of SAP forums in which users discuss: –  Software development –  SAP Products –  Usage of SAP tools •  Points system for awarding best answers –  Enables development of user reputation •  Provided with a dataset covering 33 communities: –  Spanning 2004 - 2011 –  95,200 threads –  421,098 messages •  78,690 were allocated points –  32,942 users 020060010001400 PostCount 2004 2005 2006 2007 2008 2009 2010 2011
  • 67. 67! Alberto Mendelzon Workshop 21th May 2018 Community Health Indicators •  From the literature there is no single agreed measure of ‘community health’ –  Multi-faceted nature: loyalty, participation, activity, social capital –  Different communities and platforms look at different indicators •  Indicator 1: Churn Rate (loyalty) –  The proportion of users who participate in a community for the final time •  Indicator 2: User Count (participation) –  The number of participating users in the community •  Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity) –  The Proportion of seed posts (i.e. thread starters that receive a reply) to non- seeds (i.e. no reply) •  Indicator 4: Clustering Coefficient (social capital) –  The average of users’ clustering coefficients within the largest strongly connected component
  • 68. 68! Alberto Mendelzon Workshop 21th May 2018 Measuring Role Compositions I: Modelling and Measuring User Behaviour •  According to existing literature, user behaviour can be defined using 6 dimensions: –  (Hautz et al., 2010), (Nolker and Zhou, 2005), (Zhu et al., 2009), (Zhu et al., 2011) –  Focus Dispersion •  Measure: Forum entropy of the user –  Engagement •  Measure: Out-degree proportioned by potential maximal out-degree –  Popularity •  Measure: In-degree proportioned by potential maximal in-degree –  Contribution •  Measure: Proportion of thread replies created by the user –  Initiation •  Measure: Proportion of threads that were initiated by the user –  Content Quality •  Measure: Average points per post awarded to the user
  • 69. 69! Alberto Mendelzon Workshop 21th May 2018 Measuring Role Compositions II: Inferring Roles •  1. Construct features for community users at a given time step •  2. Derive bins using equal frequency binning –  Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4 •  3. Use skeleton rule base to construct rules using bin levels –  Popularity = low, Initiation = high -> roleA –  Popularity < 0.5, Initiation > 0.4 -> roleA •  4. Apply rules to infer user roles and community composition •  5. Repeat 1-4 for following time steps
  • 70. 70! Alberto Mendelzon Workshop 21th May 2018 Measuring Role Compositions III: Mining Roles (Skeleton rule base compilation) •  1. Select the tuning segment •  2. Discover correlated behaviour dimensions –  Removed Engagement and Contribution, kept Popularity (Pearson r > 0.75) •  3. Cluster users into behavioural groups •  4. Derive role labels for clusters hod and number of clusters - we measure the cohesion and aration of a given clustering as follows: For each clustering rithm (Ψ) we iteratively increase the number of clusters to use where 2 ≥ k ≥ 30. At each increment of k we rd the silhouette coefficient produced by Ψ, this is defined a given element (i) in a given cluster as: si = bi − ai max(ai, bi) (3) Where ai denotes the average distance to all other items he same cluster and bi is given by calculating the average ance with all other items in each other distinct cluster and taking the minimum distance. The value of si ranges ween −1 and 1 where the former indicates a poor cluster- where distinct items are grouped together and the latter cates perfect cluster cohesion and separation. To derive silhouette coefficient (s(Ψ(k)) for the entire clustering take the average silhouette coefficient of all items. We that the best clustering model and number of clusters to is K-means with 11 clusters. We found that for smaller ter numbers (k = [3, 8]) each clustering algorithm achieves parable performance, however as we begin to increase the ter numbers K-means improves while the two remaining rithms produce worse cohesion and separation. ) Deriving Role Labels: Provided with the most cohesive separated clustering of users we then derive role labels each cluster. Role label derivation first involves inspecting dimension distribution in each cluster and aligning the ibution with a level mapping (i.e. low, mid, high). This bles the conversion of continuous dimension ranges into rete values which our rule-based approach requires in the eton Rule Base. To perform this alignment we assess the Fig. 2. Boxplots of the feature distributions in each of the 11 clusters. Feature distributions are matched against the feature levels derived from equal- frequency binning TABLE II MAPPING OF CLUSTER DIMENSIONS TO LEVELS. THE CLUSTERS ARE ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY. Cluster Dispersion Initiation Quality Popularity 1 L L L L 0 L M H L 6 L H M M 10 L H M H 4 L H H M 2,5 M H L H 8,9 M H H H 7 H H L H 3 H H H H decision node, we measure the entropy of the dimensions and their levels across the clusters, we then choose the dimension with the largest entropy. This is defined formally as: H(dim) = − |levels| level p(level|dim) log p(level|dim) (4) 0 1 2 3 4 5 6 7 8 9 0.00.20.40.6 Cluster Dispersion 0 1 2 3 4 5 6 7 8 9 0.000.010.020.030.04 Cluster Initiation 0 1 2 3 4 5 6 7 8 9 0246810 Cluster Quality 0 1 2 3 4 5 6 7 8 9 0.0000.0050.0100.0150.020 Cluster Popularity •  1 - Focussed Novice •  2,5 - Mixed Novice •  7 - Distributed Novice •  3 - Distributed Expert •  8,9 - Mixed Expert •  0 - Focussed Expert Participant •  4 - Focussed Expert Initiator •  6 - Knowledgeable Member •  10 - Knowledgeable Sink
  • 71. 71! Alberto Mendelzon Workshop 21th May 2018 Health Indicator Regression •  Managing online communities is helped by understanding the relation between behaviour and health −200 200 600 −2000100 Churn Rate PC1 PC2 101 161 197198210226252 256 264 265 270 319 353 354 412 413 414 418 419 420 44470 50 56 −800 −400 0 400 −2000100 User Count PC1 PC2 101 161197198210226 252 256 264 265270319 353 354 412 413414 418419 420 44 470 50 56 −400 0 200 −1000100200300 Seeds / Non−seeds Prop PC1 PC2 101 161197 198210 226252256 264 265 270 319 353 354 412 413414 418 419 42044 470 50 56 −600 −200 200 −150−50050100 Clustering Coefficient PC1 PC2 101 161 197 198 210 226 252 256 264 265 270319 353 354412413414 418 419 420 44 470 50 56 No global composition pattern for the entirety of SCN •  Identified key differences as to ‘What makes Communities tick’ •  Decrease in Focussed Experts correlated with an increase in Seeds-to-Non-Seeds !
  • 72. 72! Alberto Mendelzon Workshop 21th May 2018 Sentiment Analysis Saif, H., Fernandez, M., Kastler, L., & Alani, H. (2017). Sentiment lexicon adaptation with context and semantics for the social web. Semantic Web, 8(5), 643-665. Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment analysis of Twitter. Information Processing & Management, 52(1), 5-19. http://oro.open.ac.uk/42471/ Saif, H., Ortega, F. J., Fernández, M., & Cantador, I. (2016). Sentiment analysis in social streams. In Emotions and Personality in Personalized Services (pp. 119-140). Springer, Cham. Saif, H., Fernandez, M., He, Y., & Alani, H. (2014, May). Senticircles for contextual and conceptual semantic sentiment analysis of twitter. In European Semantic Web Conference (pp. 83-98). Springer, Cham. Saif, Hassan, Miriam Fernández, Yulan He, and Harith Alani. "On stopwords, filtering and data sparsity for sentiment analysis of twitter." (2014): 810-817. Some of the next slides from: https://www.slideshare.net/Staano/
  • 73. 73! Alberto Mendelzon Workshop 21th May 2018 OutLine o Definitions o Brief History o  Traditional Sentiment Analysis o  Applications o Sentiment Analysis on Social Media o  Significance o  Challenges o Semantic Sentiment Analysis o  Contextual Semantics o  Conceptual Semantics o Discussion
  • 74. 74! Alberto Mendelzon Workshop 21th May 2018 Sentiment Analysis •  Recent field of study that analyzes people’s attitudes towards entities – individuals, organizations, products, services, events - topics, and their attributes (Liu, 2012) •  Interchangeably used along with Opinion Mining, –  although they are technically different tasks –  Opinion Mining: Extract the piece of text which represents the opinion •  I have recently upgraded to iPhone 5. I am not happy with the screen size, but the camera is absolutely amazing –  Sentiment Analysis: Extract the polarity of the opinion •  I am not happy with the screen size •  The camera is absolutely amazing
  • 75. 75! Alberto Mendelzon Workshop 21th May 2018 75 Why? Because Opinion Matter! What Does the public Think?
  • 77. 77! Alberto Mendelzon Workshop 21th May 2018 http://www.datameer.com/blog/
  • 79. 79! Alberto Mendelzon Workshop 21th May 2018 Sentiment Analysis Tasks Ø  Subjectivity Detection Ø  Polarity Detection Ø  Sentiment Strength Detection Ø  Emotions Detection Ø  Sentiment Summarization Levels Ø  Subjectivity Detection Ø  Polarity Detection Ø  Sentiment Strength Detection Ø  Emotions Detection Data Types Ø  Conventional Data Ø  Microblogging Data Approaches Ø  Machine Learning Ø  Lexicon-based Ø  Hybrid Sentiment Analysis
  • 80. 80! Alberto Mendelzon Workshop 21th May 2018 Sentiment Analysis Tasks •  Subjectivity Detection –  Detect whether the text is objective or subjective •  Polarity Detection –  Detect whether the text is positive or negative •  Sentiment Strength Detection –  Detect the strength of the subjective text •  Emotions Detection –  Detect the human emotions and feelings expressed in text (e.g., “happiness”, “sadness”, “anger”)
  • 81. 81! Alberto Mendelzon Workshop 21th May 2018 Sentiment Analysis Levels Word/Entity/Aspect Level •  Given a word w in a sentence s, decide whether this word is opinionated (i.e., express sentiment) or not Phrase-level (expression-level) •  Given a multi-word expression e in a sentence s, the task is to detect the sentiment orientation of e. (I’m very happy) Sentence-level •  Given a sentence s of multiple words and phrases, decide on the sentiment orientation of s Document-level •  Given a document d, decide on the overall sentiment of d
  • 82. 82! Alberto Mendelzon Workshop 21th May 2018 Sentiment Analysis Approaches Lexicon- Based Approach Machine Learning Approach
  • 83. 83! Alberto Mendelzon Workshop 21th May 2018 Machine Learning Approaches •  Supervised Classifiers: Naïve Bayes, MaxEnt, SVM, J48, etc. •  Unsupervised Classifiers: k-means, hierarchical clustering, HMM, SOM •  Semi-Supervised Classifiers: Label propagation and graph-based models
  • 84. 84! Alberto Mendelzon Workshop 21th May 2018 Lexicon-based Approaches I had nightmares all night long last night :( Negative Sentiment Lexicon Text Processing Algorithm great sad down wrong horrible mistake love good MPQA, SentiWordNet, LIWC, etc. ! Lexicon generation Approaches •  Manual •  Dictionary-based •  Corpus-based !
  • 85. 85! Alberto Mendelzon Workshop 21th May 2018 85! Data Existing SA methods are designed to function on Formal Text, that is: 1.  Long enough 2.  Well-Structured 3.  Formal Sentences Social Media Text is often •  Short! •  Noisy and messy •  Have informal, and ill-structured sentences
  • 86. 86! Alberto Mendelzon Workshop 21th May 2018 Challenges to Traditional Approaches Machine Learning Approaches o  Classifier Training o  Labelled Corpora o  Labor Intensive Task o  Domain-Specific o  Re-Training with new domains o  Data Sparsity
  • 87. 87! Alberto Mendelzon Workshop 21th May 2018 87! Challenges to Traditional Approaches •  Machine Learning Approaches o  Data Sparsity o  Twitter data are more sparse than conventional Data (Saif et., 2012) o  Singleton Words constitute two-third of the words in tweets! 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# OMD# HCR# STS5Gold# SemEval# WAB# GASP# TF=1# TF>1#
  • 88. 88! Alberto Mendelzon Workshop 21th May 2018 88! Challenges to Traditional Approaches Lexicon-based Approaches o  Sentiment Lexicons (e.g., MPQA, SentiWordNet) o  Not tailored to Twitter noisy data: o  Fixed Number of words Sentiment Lexicon great sad down wrong horriblemistake love goodgrt8lol :) :P ? Need Lexicon Adaptation!
  • 89. 89! Alberto Mendelzon Workshop 21th May 2018 I had a great pain in my lower back this morning :( Sentiment in practice is usually conveyed through the latent semantics or meaning of words in texts! Ebola is spreading in Africa and ISIS in Middle East! Great Pain Negative ISIS -> Militant GroupEbola -> Virus/Disease Negative Negative Sentiment is Dynamic, domain-dependent, and…
  • 90. 90! Alberto Mendelzon Workshop 21th May 2018 Semantic Sentiment Analysis (SentiCircles) SentiCircles •  Semantic Representation of words that captures their contextual sentiment orientation and strength in tweets (Saif et al., 2014) •  Captures Contextual & Conceptual Semantics of words •  Does not rely on the structure of tweets •  Provides lexicon-based sentiment analysis: –  Tweet-level –  Entity-level Semantic sentiment analysis aims at extracting and using the underlying semantics of words/aspects in identifying their sentiment orientation with regards to their context in the text !
  • 91. 91! Alberto Mendelzon Workshop 21th May 2018 Distributional Semantic Hypothesis Trojan Horse Threat Hack Code Malware Program Dangerous Harm Trojan Horse Greek Tale History ClassWooden Troy “Words that occur in similar context tend to have similar meaning” Wittgenstein (1953)
  • 92. 92! Alberto Mendelzon Workshop 21th May 2018 Capturing Contextual Semantics Term (m) C1 C2 Cn…. Context-Term Vector Degree of Correlation Prior SentimentSentimen t Lexicon 3 Capturing and Representing Semantics for Sentiment Analysis In the following we explain the SentiCircle approach and its use of contextual and con- ceptual semantics. The main idea behind our SentiCircle approach is that the sentiment of a term is not static, as in traditional lexicon-based approaches, but rather depends on the context in which the term is used, i.e., it depends on its contextual semantics. We define context as a textual corpus or a set of tweets. To capture the contextual semantics of a term we consider its co-occurrence patterns with other terms, as inspired by [27]. Following this principle, we compute the semantics of a term m by considering the relations of m with all its context words (i.e., words that occur with m in the same context). To compute the individual relation between the term m and a context term ci we propose the use of the Term Degree of Correlation (TDOC) metric. Inspired by the TF-IDF weighting scheme this metric is computed as: TDOC(m, ci) = f(ci, m) ⇥ log N Nci (1) where f(ci, m) is the number of times ci occurs with m in tweets, N is the total number of terms, and Nci is the total number of terms that occur with ci. In addition to each TDOC computed between m and each context term ci, we also consider the Prior Sentiment of ci, extracted from a sentiment lexicon. As with common practice, if this term ci appears in the vicinity of a negation, its prior sentiment score is negated. The negation words are collected from the General Inquirer under the NOTLW category.4 (1) (2) Trojan Horse threat attack (3) Contextual Sentiment Strength Contextual Sentiment Orientation Positive, Negative Neutral [-1 (very negative) +1 (very positive)]
  • 93. 93! Alberto Mendelzon Workshop 21th May 2018 SentiCircles The SentiCircle Approach Term (m) C1 Degree of Correlation Prior Sentiment Trojan Horse Context Terms X = R * COS(θ) Y = R * SIN(θ) Dangerou s X ri θi xi yi PositiveVery Positive Very Negative Negative +1 -1 +1-1 Neutral Region ri = TDOC(Ci) θi = Prior_Sentiment (Ci) * π threat destroy Malicious attac k easil y discoveruseful fixC1Dangerous Overall Contextual Sentiment (Senti- Median) where the geometric median is a point g = (xk, yk) in which its Euclidea to all the points pi is minimum. We call the geometric median g the Senti-M captures the sentiment (y-coordinate) and the sentiment strength (x-coordin SentiCircle of a given term m. Following the representation provided in Figure 1, the sentiment of the dependent on whether the Senti-Median g lies inside the neutral region, t quadrants, or the negative quadrants. Formally, given a Senti-Median gm o the term-sentiment function L works as: L(gm) = 8 < : negative if yg < positive if yg > + neutral if |yg|  & xg  0 where is the threshold that defines the Y-axis boundary of the neutral region illustrates how this threshold is computed.
  • 94. 94! Alberto Mendelzon Workshop 21th May 2018 Examples
  • 95. 95! Alberto Mendelzon Workshop 21th May 2018 Tweet-Level Contextual Sentiment (I) (1) The Median Method Cycling under a heavy rain.. what a #luck! S-Median S-Median S-Median S-Median S-Median S-Median The Median of Senti-Medians
  • 96. 96! Alberto Mendelzon Workshop 21th May 2018 Tweet-Level Contextual Sentiment (II) (2) The Pivot Method like1 X Y r1 θ1 PositiveVery Positive Very Negative Negative new2 pj r2 θ2 like1 new2 iPadj Wn Sj1 Sj2 Tweet tk ... ian Method: This method takes the median of all Senti-Medians, and this all tweet terms to be equal. Each tweet ti 2 T is turned into a vector of Senti- g = (g1, g2, ..., gn) of size n, where n is the number of terms that compose the d gj is the Senti-Median of the SentiCircle associated with term mj. Equation d to calculate the median point q of g, which we use to determine the overall nt of tweet ti using Function 6. t Method: This method favours some terms in a tweet over others, based on mption that sentiment is often expressed towards one or more specific targets, e refer to as “Pivot” terms. In the tweet example above, there are two pivot iPhone” and “iPad” since the sentiment word “amazing” is used to describe hem. Hence, the method works by (1) extracting all pivot terms in a tweet and; mulating, for each sentiment label, the sentiment impact that each pivot term from other terms. The overall sentiment of a tweet corresponds to the sentiment h the highest sentiment impact. Opinion target identification is a challenging is beyond the scope of our current study. For simplicity, we assume that the ms are those having the POS tags: {Common Noun, Proper Noun, Pronoun} in For each candidate pivot term, we build a SentiCircle from which the sentiment hat a pivot term receives from all the other terms in a tweet can be computed. y, the Pivot-Method seeks to find the sentiment ˆs that receives the maximum nt impact within a tweet as: ˆs = arg max s2S Hs(p) = arg max s2S Np X i NwX j Hs(pi, wj) (7) 2 S = {Positive, Negative, Neutral} is the sentiment label, p is a vector of I like my new iPad
  • 97. 97! Alberto Mendelzon Workshop 21th May 2018 Performance {Tweet-level sentiment analysis} 40.00 50.00 60.00 70.00 80.00 MPQA-Lex Sen'WNet-Lex Sen'Circle Polarity Detec-on Accuracy F-Measure 62.00 64.00 66.00 68.00 70.00 72.00 74.00 Accurcy F1 Polarity Detec-on Sen'Strength Sen'Circle {Entity-level sentiment analysis} 30 40 50 60 70 80 90 MPQA-Lex SentiWNet-Lex SentiStrength SentiCircle Subjectivity Detection Accurcy F1 65 70 75 80 85 90 MPQA SentiWordNet SentiStrength SentiCircle Polarity Detection Accurcy F1 +30-40% +2-15% +20% +1/-1%
  • 98. 98! Alberto Mendelzon Workshop 21th May 2018 Enriching SentiCircles with Conceptual Semantics •  Semantic Extracted from external knowledge sources (e.g., ontologies and semantic networks). ISIS is spreading in the Middle East like Cancer! What a sad day, 4 doctors were lost to Ebola today! Finally, I got my iPhone 6s, What a product!! Jihadist Militant Virus Apple- Product
  • 99. 99! Alberto Mendelzon Workshop 21th May 2018 Enriching SentiCircles with Conceptual Semantics Cycling under a heavy rain.. What a #luck! Weather Condition Wind Snow Humidity 68.00% 70.00% 72.00% 74.00% 76.00% 78.00% Precision Recall F1 Unigrams POS Semantics {Tweet-level sentiment analysis} +4%
  • 100. 100! Alberto Mendelzon Workshop 21th May 2018 •  Typical Sentiment Lexicons: –  Context-insensitive sentiment –  Fixed set of words •  Lexicon Adaptation –  Update the sentiment of words in a given lexicon with respect to their contextual in text. •  Cold beer -> Positive •  Great Pain -> Negative Tweets Extract Contextual Sentiment Rule-based Lexicon Adaptation Sentiment Lexicon Adapted Lexicon Lexicon Adaptation with SentiCircles Sentiment Lexicon Adaptation
  • 101. 101! Alberto Mendelzon Workshop 21th May 2018 Words Found in the Lexicon 9.6% Words flipped their sentiment orientation 33.82 Words changed their sentiment Strength 62.94 Words remained unchanged 3.24 New Opinionated words 21.37 Words in Thelwall-Lexicon were adapted based on their context in three different datasets: OMD, HCR, STS (Saif et al., 2013) Adaptation Impact on Thelwall-Lexicon Adaptation Impact! 66.29 61.4 69.29 66.03 55 60 65 70 Accuracy F1 Original Lexicon Adapted Lexicon
  • 102. 102! Alberto Mendelzon Workshop 21th May 2018 •  SentiCircles can effectively captures the contextual semantics and sentiment at the corpus level •  Provides Lexicon-based (Unsupervised Sentiment Analysis •  Provides domain-specific Sentiment Analysis •  Low Complexity –  Does not rely on the sentence Structures •  Not tailored to tweet-level / sentence-level context •  Sensitive to imbalanced sentiment class distribution •  Not very effective with small Twitter datasets Strengths and Limitations
  • 103. 103! Alberto Mendelzon Workshop 21th May 2018 103 Take off Message
  • 104. 104! Alberto Mendelzon Workshop 21th May 2018 Take off Message •  Social Media data can be mined for multiple applications •  It’s a great way to understand social phenomena at scale! •  This research must be interdisciplinary •  When using and studying social media we need to be very aware of the problems (ethics / biases / misinformation) •  A “pinch” of semantics goes a long way J THX A LOT FOR LISTENING! J
  • 105. 105! Alberto Mendelzon Workshop 21th May 2018 105 Let’s Download some Twitter Data ☺
  • 106. 106! Alberto Mendelzon Workshop 21th May 2018 Time to Play! •  Automatic data collection generally relies on JSON APIs and OAuth credentials. For example, for Twitter, you need to: 1.  Create a Twitter account (https://twitter.com). 2.  Obtain an OAuth access credentials (i.e., access token, access secret, consumer key and consumer secret) (https://apps.twitter.com/app/new). 3.  Use Search API for collecting tweets (https://developer.twitter.com). 4.  Save Tweets in JSON or other format for later analysis.