UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling and Adaptation
1. Delft
University of
Technology
Link, Like, Follow, Friend:
The Social Element in User Modeling and
Adaptation
UMAP, Rome, June, 2013
Geert-Jan Houben
Web Information Systems, TU Delft
3. 3
Social Web & UMAP
We observe, reflect, speculate, and raise discussion
about evolutions and opportunities
for UMAP to make a difference.
Triggered by
the social element in UMAP and other conferences
& our own experience in the field.
4. 4
Social Web in UMAP:
a number of mentions of ‘social’, and
a small number of ‘social web’ in the papers.
New U (in UMAP), new users
And we see more.
We see how the Social Web mirrors people, mirrors users.
What we learn at the Social Web,
learn (more) about users
and for user modeling
and adaptation.
5. 5
UMAP in the new Web world
What we learn at the Social Web
allows us to reconsider UMAP in the Web.
It brings new opportunities for us as researchers.
Perhaps it brings new needs.
Surely, these are opportunities that we can position within our
UMAP research agenda
and UMAP application portfolio.
6. 6
SWUMAP: 1 + 1 = 3
Experience shows to combine:
Understanding & Creating
UM & AP
Machines & Humans
Arrive at a body of knowledge
for turning insights about
users and usage into added
value in society and economy.
7. 7
UMAP systems are Web systems
Lessons tell us to reconsider our system concept.
On the Web systems are ‘in vivo’: open and dynamic.
• Users & data are not (longer) ‘inside the system’.
• Users & data change, move (more) quickly.
This impacts understanding and creating of systems.
This also impacts the systems’ architecture.
With the (Social) Web as our laboratory,
this also impacts our research discipline.
11. 11
Domain: Incidents and emergencies
In literature we see a fair attention
for the domain of incidents and emergencies.
Our own experience from several years
is situated in that domain.
It has given us a good feeling
for what is needed and
how UMAP research can be part of a bigger effort
to solve real-world problems.
12. 12
Domain: Incidents and emergencies
In literature, most attention is directed towards
understanding and detecting.
Sometimes we see further objectives in
responding,
creating situational awareness (specially in massive
incidents), and
prevention.
Most used in these studies is Twitter.
15. 15
Twitter
With Twitter, we have a whole new reflection of
what is happening in the world.
A whole new source of digital data
that reflects the (real) world.
We need to understand that reflection
to understand the world
and help the world.
Two challenges:
1. Understand the world, and
2. Understand its reflection in the Social Web.
17. 17
400+ million tweets per day
• Netherlands ranks #1 in Twitter penetration
Twitter users publish about “anything”
• Work/private life
• Interesting events
• Etc.
Twitter tells us a lot about the world.
And its users can be seen to act as social sensors and
citizen journalists.
Monitoring Twitter
24. 24
A new source of knowledge
An example of the speed
and the nature
of knowledge that Twitter provides
and what it does
to provide knowledge about what really happened.
Also, it shows what we need to know and understand
to use and interpret this effectively.
25. 25
1. Early warning
• Twitter users publish early signals that might indicate an increased
risk or potential incident.
2. Crisis management
• (Eye-witness) Twitter users disseminate information about incidents
which can support operational emergency services.
3. Post evaluation
• Post analyzing incident data (in retrospect) to measure the
effectiveness of emergency services.
Twitcident goals
26. 26
• Emergency services
• Law enforcement, fire fighters, governments
• Big event organizers
• Festival security companies
• Utility organizations
• Public transport, energy supply, other vital infrastructures
Stakeholders
33. 33
Example festival disaster
The research into this example
created a lot of knowledge
about what is possible
and what is desired.
It was also a good example to follow
and approach new use cases
to build more general understanding and theory.
36. 36
Twitcident processes 100k tweets/day
The social weather map provides ProRail with a timely
and accurate overview of citizen observations.
In addition to other sources of knowledge.
Value
37. 37
Big Events
New Year’s Eve
Serious Request
Elections
Lowlands
Summer Carnaval
Fantasy Island
Queen’s Day
53. 53
Recommendations by Cohen
• Clear communication strategy
• Planning & organizing in advance
• Social media monitoring
• Clear intervention policy
56. 56
Recommendation from experience
Let us go and find the needle
that tells us what appears to be happening out there
But let us also think about how to support the action
to make the world out there a better one.
58. 58
“Polling meaningful information”
“Sifting thousands of tweets during hurricane Irene”
“Getting situational awareness”
“Finding the eye’s on the ground”
“Finding actionable information”
“Providing timely reaction”
“
“Volunteers are great”
“But we need hybrid approaches to
monitor social media”
Patrick Meier
Today’s challenges
59. 59
Hybrid approach
Twitcident has also shown us how
these problems ask for a hybrid approach
with humans in the loop
that handle and interpret the knowledge
derived from the Social Web.
Big Data is available from the Social Web,
but Small Interpretations are needed, to get it right!
60. 60
Human interpretation inside
The nature of these problems makes
that solutions are not fully automatic.
They involve users of systems
that help the interpretation and decision taking.
It is a special kind of users
that we (as UMAP) can consider
and that is fast growing
and in urgent need of support.
61. 61
Take home from experience
Learn from concrete cases:
• Case-based experimental approaches bring specific understanding and
experience necessary for general understanding and theory.
• Cases can have great value for stakeholders.
It is all about correct and actionable interpretation:
• Make information meaningful and actionable in the context.
• Employ hybrid, human-enhanced approaches for the context.
65. 65
Challenge: Making sense of Twitter
Inspired by different applications and domains,
researchers have given attention
to underlying technology
for making sense of Twitter.
‘Finding the needle’
as the research challenge.
66. 66
Technology for making sense
The sense-making usually relies on
application and domain specific knowledge and
researchers investigate how to do it effectively.
Semantics and interactivity
prove to be important ingredients.
In fact, it turns out that
sense-making, i.e. finding the needle,
is a combination of many things
that need to be coming together.
69. 69
Semantics for filtering and search
In [HT2012] we considered
what is needed as first steps in processing tweets,
before we can ‘analyze’ them.
70. 70
1. (Automatic) Filtering: Given an incident, how can one
automatically identify those tweets that are relevant to
the incident?
2. Search & Analytics: How can one improve search and
analytical capabilities so that users can explore
information in the streams of tweets?
Twitter streams
Challenges
Filtering
topic
Search &
Analytics
information need
71. 71
Dataset
• Twitter corpus (TREC Microblog Track 2011)
• 16 million tweets (Jan. 24th – Feb. 8th, 2011 )
• 4,766,901 tweets classified as English
• 6.2 million entity-extractions
• News (Same time period)
• 62 RSS News Feeds
• 13,959 News Articles
• 357,559 entity-extractions
73. 73
Filtering evaluation
The semantic strategy is more robust and
achieves higher precisions for complex topics.
1 2 3 4
number of entities extracted from
inital topic description
0
0.2
0.4
0.6
0.8
1
Precision@30andRecall
Precision@30
Recall
1 2 3 4 5
number of words in the inital topic
description
0
0.2
0.4
0.6
0.8
1
Precision@30andRecall
Precision@30
Recall
75. 75
Faceted search evaluation
Strategies with semantic enrichment outperform
those without in predicting appropriate facet-values.
3Adaptive Faceted Search on Twitter
!"#$% !"#&%
!"#'%
!"'(%
!"#&%
!")'%
!"#(%
!"'*%
!"#+% !"#)%
!",+%
!"',%
!%
!"!+%
!"'%
!"'+%
!",%
!",+%
!"#%
!"#+%
!")%
!")+%
-./0123456.7%
89
.:0.2058;.%
</.=>.2?@%
A30AB3C:D30.7%
EF+%
EF'!%
GHH%
with semantic enrichment without semantic enrichment
76. 76
Lessons
The context: a (Twitcident-inspired) framework for
filtering, searching, and analyzing information
about incidents that people publish on Twitter.
We have seen how to obtain
• better filtering of Twitter messages for a given incident,
• better search for relevant information about an incident
within the filtered messages.
For these first steps in processing Twitter messages,
the semantic interpretation is the key element
that we need to understand for the given context.
78. 78
Semantics for enrichment and linkage
In [ESWC2011] we focused more on
the semantics for enrichment and linkage
to connect the tweets to background knowledge
and thus enhance what we can learn from them.
79. 79
SI Sportsman of the
year: Surprise French
Open champ
Francesca Schiavone
Thirty in women's tennis is primordially
old, …
news article
topic:Sports topic:Sports
topic:Tennis
person:Francesca_Schiavone
oc:SportsGame
event:FrenchOpen
francesca is becoming #sport
idol of the year!
microblog post
user
enrichment enrichment
user modeling
linkage
Profile
Topics of interest:
- topic:Tennis
- topic:Sports
People of interest:
- person:Francesca_Schiavone
Events of interest:
- event:FrenchOpen
Example: Semantic enrichment of Twitter posts
80. 80
SI Sportsman of the
year: Surprise French
Open champ
Francesca Schiavone
Thirty in women's tennis is primordially
old, …
news article
francesca is becoming #sport
idol of the year!
microblog post
user linkage
How?
Goal
in
this
linkage
discovery
is
to
iden3fy
news
resources
that
are
related
to
a
given
Twi8er
message:
1. Web
resource
has
to
be
related
to
the
given
tweet
2. Web
resource
has
to
be
related
to
news
Linkage discovery
81. 81
Francesca Schiavone is
sportsman of the year
#sport #tennis
Content-based
SI Sportsman of the year:
Surprise French Open
champ
Francesca Schiavone
Thirty in women's tennis is
primordially old…
Francesca Schiavone is
sportsman of the year
#sport #tennis
Hashtag-based
Petkovic & Goerges
leading German tennis
revival
there are signs that German
tennis is…
The image
cannot be
displayed.
Linkage discovery strategies
82. 82
nice! http://bit.ly/eiU33c URL-based
SI Sportsman of the year:
Surprise French Open
champ
Francesca Schiavone
Thirty in women's tennis is
primordially old…
news article URL
Entity-based
Olympic champion and world
number nine Elena
Dementieva announced her
retirement
The 29-year-old Russian delivered
the shock news after losing to
Francesca Schiavone in the group
stages of the season-ending
tournamen …
news article
Entity-based
Francesca Schiavone is
sportsman of the year
#sport #tennis temporal constraint
Old news L
publish date
publish date
• URL-based (Strict): only consider content of the Twitter message
• URL-based (Lenient): also consider reply or re-tweet messages
Linkage discovery strategies
84. 84
Analysis on linkage discovery and
semantic enrichment
• URL-based strategies: more than 10 tweet-news relations for c.a. more than 1000
• Entity-based strategy: found
a far more higher number of
tweet-news relations
• Hashtag-based strategy failed
for more than 79% of the users
because of the limited usage of
hashtags
• Combination of all strategies:
higher than 10 tweet-news
relation found for more than 20%
of the users
Entity-based URL-based
Hashtag-based
Combination
Combined strategies perform better.
85. 85
Lessons
There is good background knowledge out there,
if we are able to understand how it connects
to the domain and context we are considering.
Many applications can share
the same enrichment and linking,
but not all.
With common descriptions of the problem,
we can share enrichment and linking (more) effectively.
87. 87
Challenge: Social web for profiles
An ambition often seen in conferences like this one is
to exploit the semantic enriched social web knowledge
for the purpose of creating or enhancing user profiles.
These profiles can then be used for
adaptation and personalization.
88. 88
Components for profiling
For applications such as
personalized news recommendation,
like in our [UMAP2011] work,
components for profiling
can be carefully selected and assembled.
It can also help the
development of the deeper understanding
and theory about how to
link the data to background knowledge
and thus make sense of the data.
89. 89
Library
GeniUS [JIST2011] is a topic and user modeling software
library that
• produces semantically meaningful profiles, to enhance
the interoperability of profiles between applications;
• provides functionality for aggregating relevant
information about a user from the Social Web;
• generates domain-specific user profiles according to the
information needs of different applications;
• is flexible and extensible to serve different applications.
90. 90
GeniUS: Generic Topic and User Modeling Library
for the Social Semantic Web
Item
Fetcher
Enrichment
Weighting
Function
RDF
Repository
Filter
Modeling
Configuration
RDF
Serialization
Social Web
Semantic Web
user data
items
enriched
items
semantic data
user profiles
interested in:
locationproduct
92. 92
User modeling with rich semantics:
interested in:
people topics events …linkage
user profile construction
#sport
person:Francesca_Schiavone
topic:Sports
event:FrenchOpen
topic:Tennis
time
weekday weekend
Profile types
• hashtag-
based
• topic-based
• entity-based
enrichment
• tweet-only
• exploitation of
external news
resources
temporal
patterns
• specific time
period
• temporal pattern
• No constrains
User profile construction
95. 95
1 10 100 1000
user profiles
0
10
100
1000
10000
entitiesperuserprofile
News-based
Tweet-based
1 10 100 1000
user profiles
0
10
distincttopicsperuserprofile
News-based
Tweet-based
Entity-based profiles Topic-based profiles
profiles enriched
with external news
resource
profiles enriched
with external news
resource
By exploiting the linkage between tweets and news articles, we get
more distinct entities / topics (semantics)!
Richer semantics through linking strategies.
Analysis of profile characteristics
96. 96
Lessons
For profiles, we observed:
• Semantic enrichment allows for richer user profiles.
• Profiles change over time (hashtag-based more): fresh
profiles seem to better reflect current user demands.
• Temporal patterns: weekend profiles differ significantly
form weekday profiles (more than day/night).
For personalized news recommendation, we learned:
• Best user modeling strategy:
Entity-based > topic-based > hashtag-based.
• Semantic enrichment improves recommendation quality.
• Adapting to temporal context helps for topic-based
strategy.
98. 98
Augment with what is there
Systems can use technology to augment their knowledge
with data from the Social Web.
Lessons learned show that
for adaptive systems on the Social Web
there is a lot of knowledge (easily) available,
from other systems and other domains.
Understanding how to leverage it, even to a basic level,
can bring a lot.
100. 100
Cross-system profiles
An example to show the added value of
‘cross-system’ on the Social Web
is the work in [UMUAI 2013]
where interweaving of public profiles is studied.
101. 101
User data on the Social Web
Cross-system user modeling on
the Social Web
102. 102
Google
Profile
URI
h.p://google.com/profile/XY
4.
enrich
data
with
seman?cs
WordNet®
Seman'c
Enhancement
Profile
Alignment
3.
Map
profiles
to
target
user
model
FOAF
vCard
Blog
posts:
Bookmarks:
Other
media:
Social
networking
profiles:
2.
aggregate
public
profile
data
Social
Web
Aggregator
1.
get
other
accounts
of
user
SocialGraph
API
Account
Mapping
Aggregated,
enriched
profile
(e.g.,
in
RDF
or
vCard)
Analysis
and
user
modeling
5.
generate
user
profiles
Interweaving public user data with Mypes
103. 103
1. Characteristics of distributed tag-based profiles:
• Overlap of tag-based profiles, which an individual user creates at
different services, is low
• Aggregated profiles reveal significantly more information
(regarding entropy) than service-specific profiles
2. Performance of cross-system user modeling for cold-
start recommendations:
• Cross-system UM leads to tremendous (and significant)
improvements of the tag and bookmark recommendation quality
• To optimize the performance one has to adapt the cross-system
strategies to the concrete application setting
http://persweb.org
Lessons
105. 105
Improved location estimation by
mixing Social Web streams
+ =
external data sources:
Enriching the image’s textual meta-data with the user’s
tweets improves the accuracy of the location estimation.
106. 106
Accuracy of social web metadata
This work has also raised attention
for the accuracy of Social Web metadata.
There are many reasons
why this data cannot be taken as the universal truth.
In application and domain specific contexts,
we need to understand the accuracy of social metadata.
Also, the work of [Rout et al. 2013] on location estimation
based on social ties, shows the feasibility
as well as the context-dependency.
108. 108
LOD and cross-system
With these results in hand,
in our [ICWE2012] work,
we considered cross-system modeling
with Linked Open Data.
With the aim to understand how
Linked Open Data background knowledge
can be leveraged for cross-system and cross-domain
augmentation.
110. 110
c1
c4
c5
c6
weigh'ng
strategies
Applica'on
that
demands
user
interest
profile
regarding
-‐concepts
c2
c3
cx
cy
c9
User
Profile
concept
weight
0.4
0.1
0.2
c1
c2
c3
…
…
concepts
that
can
be
extracted
from
the
user
data
user
data
Social
Web
background
knowledge
(graph
structures)
Linked
Data
LOD-based User Modeling
111. 111
tags: girl with
pearl earring
geo: The Hague
dbpedia:Girl_with_pearl_earring
A
Artifact
B
The
lacemaker
C
The
astronomer
…
rdf:type
Johannes Vermeer
foaf:maker
foaf:maker
Strategies for exploiting the RDF-based
background knowledge graph
dbpedia:The_Hague
dbpedia:Louvre
dbpprop:locationlocatedIn
112. 112
Lessons
With LOD-based user modeling on the Social Web,
different strategies for exploiting RDF-based
background knowledge are possible.
Findings:
• Combination of different user data sources (Flickr &
Twitter) is beneficial for the user modeling performance.
• User modeling quality increases the more background
knowledge one considers.
• Combination of strategies achieves the best performance.
To investigate further: dependency of strategies of
entities and relationships, and temporal effects (eg
temporal relationships or upcoming trends).
113. 113
Interlinked online society
If you take a semantic technology perspective,
then strong interlinking could be the direction to go.
[Passant et al. 2009] studies applying semantic
technologies to social media, creating a Web where data is
socially created and maintained through end-user
interactions, but is also machine-readable and therefore
open towards sophisticated queries and large-scale
information integration.
"Social Semantic Information Spaces”, where any social
data is a component in a worldwide collective intelligence
ecosystem.
114. 114
Origin of semantics
These social semantic spaces can trigger us in UMAP
to articulate where we see the role and origin of
semantics.
Making all social data available ‘with semantics’
or
observing that a lot of semantics
is (only) effective in a specific domain or application?
Experience showing the fine-grained nature of effects
suggests the latter.
116. 116
Humans & adaptive faceted search
An important element in the process of sense-making
is its hybrid nature:
humans involved in the sense-making.
The control rooms have shown us that the
human aspect in search is crucial,
for judgment and interpretation.
In our [ISWC2011] work,
we looked at adaptive faceted search.
117. 117
Adaptive faceted search framework
Adaptive Faceted Search
Twitter posts
Semantic Enrichment
User and Context Modeling
user
How to adapt the
facet-value pair
ranking to the
current demands
of the user?
How to represent
the content of a
tweet?
facet extraction
118. 118
Facet extraction and semantic enrichment
@bob: Julian Assange got
arrested
Julian Assange
Julian Assange Tweet-based
enrichment
Julian Assange arrested
Julian Assange, the founder of
WikiLeaks, is under arrest in
London…
Link-based
enrichment
Julian Assange
London
WikiLeaks
Julian Assange
Julian Assange
London
WikiLeaks
powered by
119. 119
Impact of Link-based enrichment
Representation of
tweets:
significantly more
facets per tweet
with link-based
enrichment
120. 120
Faceted search strategies
Goal: most relevant facet-value pair should appear at the top
of the ranking
Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP
2. Personalization: adapt ranking to user profile (eg user tweeting history)
3. Diversification: increase variety among the top-ranked FVPs
4. Time-sensitivity: adapt FVP ranking to temporal context
Semantic enrichment: (i) tweet-based and (ii) link-based enrichment
Locations
1. Aachen
2. Aalborg
3. Aalesund
4. Aarhus
…
2145. Eindhoven
Locations
1. Eindhoven
2. Delft
3. Amsterdam
4. Rotterdam
5. London
…
Link-based enrichment and occurrence-based and
personalized rankings have large effect.
122. 122
Lessons
Semantic enrichment allows for structured
representation of the content of tweets:
a good basis for faceted search.
Faceted search performs significantly better than
hashtag-based keyword search
Different building blocks for making faceted search on
Twitter adaptive improve the search quality:
• Link-based enrichment: more discoverable tweets, better search
performance.
• Personalization leads to significant improvements.
• Time-sensitivity improves performance as well.
124. 124
Duplicate detection
Important for reducing the volume of social data,
is to categorize the social chatter
and reduce redundancy in information.
In our [WWW2013] work we have considered
duplicate detection.
125. 125
Twitter is more like a news media.
How do people search on Twitter?
[Teevan et al. 2011] has shown how this is characterized by
repeated queries & monitoring for new content.
Problems:
• Short tweets è lots of similar information.
• Few people produce contents è many retweets, copied content.
Search and retrieval on Twitter
128. 128
Lessons
Analyzing duplicate content in Twitter, we inferred a model
for categorizing different levels of duplicity.
We developed a near-duplicate detection framework
for microposts and for categorizing duplicity of tweet pairs.
Given the duplicate detection framework, we perform
extensive evaluations and analyses of different duplicate
detection strategies.
Our approach enables search result diversification,
also good to avoid ‘bubble effects’, and analyzes the
impact of the diversification on the search quality.
Follow Twinder progress: http://wis.ewi.tudelft.nl/twinder/
129. 129
Take home from technology research
With semantics and humans, Social Web can help:
• Semantics beneficial for filtering & search and enrichment & linking.
• Semantic-enriched tweets beneficial for profiles and adaptation.
• Social Web & Linked Data beneficial for cross-system augmentation.
• Adaptive faceted search and duplicate detection beneficial for human-
enhanced processing.
For adaptive systems that rely on profiling,
Social Web is a fertile source for more knowledge.
ImREAL research & experiences elegantly show principles,
as well as the detailed work in domain & application:
• Social Web & LOD usage is context-specific.
• Big Data in need of Small Interpretations.
131. 131
Take home from technology research
The human intelligence is to be arranged differently:
• We have moved from a priori understanding the system, to on the fly
understanding the system.
• We have moved from careful manual analysis before, to machines doing the
analysis on the fly.
• The critical and context-specific approach to (small) data, about domain
and users, is a part of process and system we now need to (re-)include.
• This task of the designer has now shifted to a task for the human interpretation
inside the hybrid system: human monitoring inside.
135. 135
In reality, not one truth
In the beginning, social systems like Twitter were used
as ‘the’ semantic source of knowledge with an implicit
assumption that Twitter is one voice.
Over time, researchers have begun to investigate
how to identify and interpret different voices and
viewpoints in such a source.
Differences in viewpoints and opinions
are subject of study, but until now leverage is limited
136. 136
Diversity and beliefs
[Flock et al. 2011] study the different backgrounds,
mindsets and biases of Wikipedia contributors,
to understand the effects - positive and negative –
of this diversity on the quality of the Wikipedia content,
and on the sustainability of the overall project.
• Analysis and approach for diversity-minded content
management within Wikipedia.
[Bhattachanya et al. 2012] estimate beliefs from posts
made on social media, to monitor the level of belief,
disbelief and doubt related to specific propositions.
137. 137
Include the negative
Diversity of viewpoints and opinions also suggests to
include negative links in the approach.
[Symeonidis et al. 2010] give an example of how to
include negative links into friend recommendation
approaches, but this goes much further.
The effect they observe on improving accuracy
can be held as a principle
where accuracy improvement can be gained
using information about positive and negative edges.
138. 138
ViewS
Modelling Viewpoints in User Generated Content
Text
processing
Viewpoint
extraction
(attention focus)
Ontology
(activity aspects
to analyse)
Semantic
enrichment
Viewpoint
exploration
139. 139
Viewpoints in YouTube
Examples viewpoints in user comments on job interview videos
Comparing the viewpoints around ‘anger’ of young users (left)
and old users (right)
141. 141
Truth is not always truth
Just like this source of knowledge is not a single one,
it is also clear that it might not be consisting of
‘true’ knowledge alone.
142. 142
Malicious profiles
For example, profiles can be suspicious and made for the
wrong reasons.
In a context of online dating, [Pizzato et al. 2012] have
observed the need to gain understanding of the sensitivity
of recommender algorithms to scammers.
With people being the items to recommend,
fraudulent profiles can be having a serious impact on
recommender algorithms.
Identifying and detecting fraudulent profiles is a new
challenge for us.
143. 143
Identity theft
Another aspect to ‘wrong profiles’ relates to
identity disambiguation and theft.
[Rowe et al. 2010] consider malevolent web practices such
as identity theft and lateral surveillance.
They study techniques for web users
to identify all web resources which cite them and
if necessary, remove the sensitive information.
144. 144
Credibility of social content
The credibility of messages in social networks is for
example studied in [Seth et al. 2010] on stories from Digg.
Their model is based on theories developed in sociology,
political science and information science.
[Cramer et al. 2008] have nicely brought attention for
trust.
The study of social content credibility and trust are
important, and ask for cross-discipline effort.
145. 145
Privacy
A lot can be said about privacy in these networks, for
example Facebook.
[Bachrach et al. 2012] shows how users’ activity on
Facebook (related to privacy) relates to their personality,
as measured by the standard Five Factor Model.
Nice example of understanding how Facebook features
relate to interesting aspects of users and usage.
147. 147
Cultural diversity
Studying diversity is not just relevant for understanding how
Twitter content is to be interpreted.
It is also relevant for understanding how the Social Web
is used and can be used with a purpose.
Cultural diversity is here one of the most interesting aspects
and perhaps also one of the most challenging ones.
148. 148
Cultural diversity
A subject addressed in ImREAL.
Components are made available as services in ImREAL
for augmented user modeling,
e.g. for simulation designers.
150. 150
Hofstede’s cultural dimensions
Describes stereotypical cultural characteristics of
nationalities, with scores relative to other nationalities
Five core dimensions:
• Individualism versus Collectivism (IDV)
• Power Distance (PDI)
• Masculinity versus Femininity (MAS)
• Uncertainty Avoidance (UAI)
• Long-Term Orientation (LTO)
geert-hofstede.com
151. 151
Analysis
• Datasets
• Microblog data collected over a period of three months
• 22 million microposts from Sina Weibo and 24m from Twitter
• a sample of 2616 Sina Weibo users and 1200 Twitter users
• Analyze and compare user behavior
• on two levels (i) the entire user population and (ii) individual users
• from different angles (i) syntactic, (ii) semantic, (iii) sentiment and
(iv) temporal analysis
152. 152
0% 20% 40% 60% 80% 100%
users
0
0.01
0.1
1
avg.numberof
hashtags/URLsperpost
Hashtag-Weibo
URL-Weibo
Hashtag-Twitter
URL-Twitter
Hashtags and URLs are less
frequently applied on Sina
Weibo than on Twitter.
Users on Twitter are more triggered by
hashtags and URLs when propagating
information than on Sina Weibo.
Syntactic analysis
high collectivism in Weibo, a high individualism in Twitter
153. 153
Semantic analysis
The topics that users discuss on Sina Weibo are to a large
extent related to locations and persons. In contrast to Sina
Weibo, users on Twitter are talking more about
organizations (such as companies, political parties).
0% 20% 40% 60% 80% 100%
users
0
0.001
0.01
0.1
1
10
avg.numberofentitiesperpost
Weibo
Twitter
low employee commitment to an organization in China - high long term orientation.
154. 154
Sentiment analysis
Sina Weibo users have a stronger tendency to publish
positive messages than Twitter users.
0% 20% 40% 60% 80% 100%
users
0%
20%
40%
60%
80%
100%
ratioofpositveposts
Weibo
Twitter
more negative posts
more positive posts
high long term orientation.
155. 155
Combined semantic sentiment analysis
The difference is amplified when discussing ‘people’ or
‘location’, with Sina Weibo users even more positive and
Twitter users more negative.
more longterm orientation in Weibo, more shortterm orientation in Twitter
156. 156
Temporal analysis
Twitter users repost messages faster than Sina Weibo users.
time distance =
trepost - toriginal post
0% 20% 40% 60% 80% 100%
users
0
0.1
1
10
100
1000
timedistance(inhours)
Weibo
Twitter
large degree of power distance in Weibo, small one in Twitter
157. 157
Cultural differences in tagging
Other work confirms the findings.
And the consistency with theories of cultural differences
between Asian and Western cultures.
[Dong et al. 2011] look at cultural differences in a
tagging system and find that American and Chinese
subjects differed in many ways:
• the number and types of tags they applied;
• the extent to which they applied suggested tags or
entered new tags of their own; and
• how often they applied tags that originated from a
different culture.
158. 158
Cultural variations for Social Q&A
Another example is given by [Yang et al. 2011] that looks
at cultural differences in people’s social question asking
behaviors across the United States, the United Kingdom,
China, and India.
They analyzed the questions people ask via social
networking tools, and their motivations for asking and
answering questions online.
Results reveal culture as a consistently significant factor
in predicting people’s social question and answer behavior.
160. 160
Understand the source
When using the knowledge from Twitter
as a semantic source,
specially if it is the only semantic source,
there are a few things one needs to consider
that relate to the real-time nature of social contributions.
The ‘knowledge’ is not unambiguous:
inconsistency, moods, etc.
Real-time knowledge spreads and evolves fast.
161. 161
Inconsistency & moods
Twitter is used as semantic sensor, sometimes as the only
semantic sensor, but consistency in user contributions
like ratings is a concern.
[Said et al. 2012] shows how users are inconsistent in
their ratings and tend to be more consistent for above
average ratings.
[De Choudhury et al. 2012] report on the relation between
moods and social activity, social relations and
participatory patterns like link sharing and conversational
engagement.
162. 162
Understanding over time
While Twitter and the like were used in the beginning
as ‘fixed’ sources of knowledge,
researchers have become interested in
the evolution over time.
The nature and speed of the flow of content over time
have become great objects of study.
Two domains that in this light have received fair attention
is that of diseases and (political) news.
163. 163
Flow in disease information
Domain of diseases and outbreaks is getting fair
attention.
Works by [Gomide et al. 2011] on Dengue and [Diaz-Aviles
et al. 2012] on EHEC, show how the people’s behavior on
Twitter can be used for surveillance and tasks such as
early warning and outbreak investigation.
164. 164
Flow of news
From [Naveed et al. 2011] we learn how retweets reflect
what the Twitter community considers interesting on a
global scale.
In [Backstrom et al. 2011] we see the differences between
communication and observation in Facebook:
communication involves a much higher focus of attention
than observation activities.
We see in [Lerman et al. 2010] how network structure
affects dynamics of how interest in news stories spreads
among social networks in Digg and Twitter
165. 165
Flow in political news
Coming back to our observation of the multiple truths,
political news is a great domain to look at.
For the contact of political speech, [Metaxas et al. 2010]
discuss how the real-time nature of Twitter provides
disproportionate exposure to personal opinions,
fabricated content, unverified events, lies and
misrepresentations, with viral spread as a consequence.
To act upon that, [Lumezanu et al. 2012] identify extreme
tweeting patterns that could characterize users who spread
propaganda (political propagandists), e.g. sending high
volumes of near-duplicate messages.
166. 166
Temporal effects
In our [WebSci2011] work, we have considered how
user interests are manifest over time.
Most users, who are interested into the news topic,
become interested within a few days.
Lifespan of users’ interest:
• Long-term adopters - continuously interested
• Short-term adopters - interested only for a short period in
time (and influenced by “global trends”)
High overlap between early adopters and long-term
adopters.
167. 167
Temporal effects
On Twitter the importance of entities for a topic varies
over time (long-term vs. short-term entities).
In terms of user interests over time, the majority of users
becomes quickly (few days) interested in a topic.
When using Twitter-based profiles for personalization,
time-sensitive user modeling improves recommendation
quality.
Also, the selection of user modeling strategy should take
the type of user into account:
• Long-term adopters: hashtag-based
• Short-term adopters: entity-based
168. 168
Twitter-based Trend and User Modeling
Framework
Twitter posts
current tweets
of Twitter
community
news
recommender?
Profile
Semantic
Enrichment
Profile Type
Aggregation
Weighting
Scheme
trends
time
user’s
interests
169. 169
Temporal effects with trends
For the domain of personalized news recommendations,
We have combined trend and user modeling in our
framework.
• We have seen how user profiles change over time, under
the influence of trends.
• Appropriate concept weighting strategies allow for the
discovery of local trends.
• Time sensitive weighting function is best for generating
trend profiles.
Aggregation of trend and user profile can improve the
performance of recommendations.
171. 171
Check with the user
With all profiles based on augmentation,
it becomes (even more) vital to follow the lessons of
checking with the user.
By engaging with the user in a
common process of validating the profile
and the assumptions based on it.
172. 172
Perico
Dialogue for Modelling Cultural Exposure using Linked Data
Initial User
Model
• Visited Countries
• Estimated Cultural
Exposure
Social
Web
Sensors
Perico Dialogue Agent
Cultural Fact
Extractor
Quiz Generator
User Profile
GeneratorDialogue Planner
Updated User
Model
• Verified Visited
Countries
• Enhanced Cultural
Exposure Score
173. 173
Perico
Dialogue for Modelling Cultural Exposure using Linked Data
Initial User
Model
• Visited Countries
• Estimated Cultural
Exposure
Social
Web
Sensors
Perico Dialogue Agent
Cultural Fact
Extractor
Quiz Generator
User Profile
GeneratorDialogue Planner
Updated User
Model
• Verified Visited
Countries
• Enhanced Cultural
Exposure Score
174. 174
Inspect and control
[Knijnenburg et al. 2012] consider how
users of social recommender systems may want to
inspect and control how their social relationships
influence the recommendations they receive:
friends are not always “nearest neighbors”.
The results show that high inspectability and control
indeed increase users’ perceived understanding of and
control over the system, their rating of the
recommendation quality, and their satisfaction with the
system, and thus an overall better user experience.
176. 176
Understanding communities
Attention is given to communities and their dynamics.
[Chan et al. 2010] proposes a method for analysing user communication
roles in discussion forums.
[Schwagereit et al. 2011] study governance in web communities.
[Karnstedt et al. 2011] considers the relation between a user's value
within a community - constituted from various user features - and the
probability of a user churning.
[Yang et al. 2010] analyze users’ activity lifespan in online knowledge
sharing communities: acknowledgement of contributions leads to user
survival.
177. 177
Involvement in communities
In order to understand how people behave in
Social Web and in communities,
it is relevant to understand their engagement and
involvement in more detail.
[Lehmann et al. 2012] study how users engage with online
services, and how to measure this engagement.
[Freyne et al. 2009] look at how social networking sites
rely on the contribution and participation of their
members: focus on early interventions for engagement.
178. 178
Communities and expertise
Understanding communities is also relevant
as these communities can act as additional resource.
From finding evidence for profiles, we have seen recent
attention shift towards finding people and expertise.
For example, to enable active engagement of people.
For using expertise in UMAP,
it is also important to be able to specify expertise,
to enable reasoning about the expertise’s quality and fit.
179. 179
Take home from challenges
The (Social) Web tells many stories:
• Acknowledge multiple truths, opposing truths, and bad intentions.
• Acknowledge multiple audiences and viewpoints.
• Acknowledge cultural variations.
The (Social) Web moves fast:
• Acknowledge the real-time nature of Web and applications.
• Analyze and understand the flow of information.
• Analyze and understand the nature of communities.
The (Social) Web includes people:
• Involve the users actively in validation.
• Involve (communities of) users in interpretation.
182. 182
Social & UMAP
Huge economic and societal potential for added value.
Social Web is a fertile source of knowledge for
augmentation.
• Semantics can be beneficial for social-based augmentation.
• Hybrid, human-enhanced approaches can be beneficial.
• Technological feasibility of augmentation.
Research from specific cases towards general theory.
Next on the agenda:
• Describe added value for stakeholders, describe goals.
• Share and compare research challenges and evaluations.
183. 183
Web & UMAP
UMAP systems are Web systems:
• The (Social) Web tells many stories.
• The (Social) Web moves fast.
• The (Social) Web includes people.
The Web is the real laboratory for UMAP systems.
Next on the agenda:
• Share and compare solutions, components, and systems.
• Support more uniformity in methods and practices.
184. 184
UMAP & Web
On the (Social) Web, systems are being made:
• Take positions or prepare to take positions about bad
intentions.
• Take responsibility and recommend about future
architectures.
On the (Social) Web, many systems are small:
• Do (also) consider the specific problems of small and medium
sized stakeholders: bring UMAP into practice.
185. 185
UMAP & Social
In SWUMAP, human intelligence is arranged
differently:
• From careful manual analysis a priori, to machine
analysis on the fly.
• Critical and context-specific approach to data is part of
the ‘in vivo’ system.
• Human interpretation of data is inside the hybrid
system.
It makes for a new type of system, and one of
great value.
And plenty of fun and diverse challenges for
UMAP.
188. 188
Thanks
Slides made with input from many,
including Alessandro, Claudia, Fabian, Ilknur, Jan,
Jasper, Ke, Qi, and Richard from WIS in Delft,
and friends from ImREAL, Net2, SEALINCMedia, and
Twitcident.