Delft
University of
Technology
Link, Like, Follow, Friend:
The Social Element in User Modeling and
Adaptation
UMAP, Rome, ...
2
Social Web & UMAP
3
Social Web & UMAP
We observe, reflect, speculate, and raise discussion
about evolutions and opportunities
for UMAP to ma...
4
Social Web in UMAP:
a number of mentions of ‘social’, and
a small number of ‘social web’ in the papers.
New U (in UMAP),...
5
UMAP in the new Web world
What we learn at the Social Web
allows us to reconsider UMAP in the Web.
It brings new opportu...
6
SWUMAP: 1 + 1 = 3
Experience shows to combine:
Understanding & Creating
UM & AP
Machines & Humans
Arrive at a body of kn...
7
UMAP systems are Web systems
Lessons tell us to reconsider our system concept.
On the Web systems are ‘in vivo’: open an...
8
APPLICATION
HUMANS FOR AUGMENTATION
USERSDOMAIN
DOMAIN
Augmented with
Web Semantics
USERS
Augmented with
Web Semantics
R...
9
10
Inspiring domain
11
Domain: Incidents and emergencies
In literature we see a fair attention
for the domain of incidents and emergencies.
Ou...
12
Domain: Incidents and emergencies
In literature, most attention is directed towards
understanding and detecting.
Someti...
13
2011 Tohoku eartquakes 1200 tweets/minmin
14
2011 Pukkelpop storm 570 tweets/min
15
Twitter
With Twitter, we have a whole new reflection of
what is happening in the world.
A whole new source of digital d...
16
CrowdSense BV
http://twitcident.org http://tno.nl/twitcidenthttp://twitcident.com
Twitcident spin-off collaboration
Our...
17
400+ million tweets per day
•  Netherlands ranks #1 in Twitter penetration
Twitter users publish about “anything”
•  Wo...
18
Train accident
Train driver got a stroke
19
Train hits a block
First eyewitness
20
1 min. later
Eyewitness
21
15 min. later
Eyewitnesses
22
1:17 hour later
News media
23
30 min. later
Entertainment
wrong
photo
24
A new source of knowledge
An example of the speed
and the nature
of knowledge that Twitter provides
and what it does
to...
25
1.  Early warning
•  Twitter users publish early signals that might indicate an increased
risk or potential incident.
2...
26
• Emergency services
•  Law enforcement, fire fighters, governments
• Big event organizers
•  Festival security compani...
27
Pukkelpop 2011
Storm incident with casualties
28
80.000 tweets in 4 hours
29
570 tweets per min.
30
Could we see this impact coming?
Semantics 25 minutes before incident
1.  Weather: storm, cloud-burst, wind, ….
2.  Loc...
31
Damage reports from incident site
32
Real-time intelligence by photos
33
Example festival disaster
The research into this example
created a lot of knowledge
about what is possible
and what is ...
34
Dutch national rail infrastructure company
Example public infrastructure
35
Social Weather Map
36
Twitcident processes 100k tweets/day
The social weather map provides ProRail with a timely
and accurate overview of cit...
37
Big Events
New Year’s Eve
Serious Request
Elections
Lowlands
Summer Carnaval
Fantasy Island
Queen’s Day
38
Crowd Control Room
39
Social media monitoring was done with 1-3 security officers
Violence, riots, fires, fireworks, crowds, ..
40
Not only monitoring
The previous examples are not only about
monitoring Twitter
to know what is happening out there.
41
She was about to turn 16…
42
So she invited some friends …
43
Which invited their friends…
44
She pulled back her invitation, but…
45
45
46
200.000 invited, 40.000 “going”
47
Atmosphere turned hostile
48
Teenagers vs. police
49
Alcohol & violence
49
50
Massive damage at local stores
50
51
Officials cannot ignore social media
52
Mayor of Haren resigns after Haren-debacle
53
Recommendations by Cohen
•  Clear communication strategy
•  Planning & organizing in advance
•  Social media monitoring...
54
Why social media monitoring?
Content tells
what to expect
55
Finding the needle
56
Recommendation from experience
Let us go and find the needle
that tells us what appears to be happening out there
But l...
57
Meaningful and actionable
Twitcident has learned us
how information obtained from Twitter needs to be
meaningful and ac...
58
“Polling meaningful information”
“Sifting thousands of tweets during hurricane Irene”
“Getting situational awareness”
“...
59
Hybrid approach
Twitcident has also shown us how
these problems ask for a hybrid approach
with humans in the loop
that ...
60
Human interpretation inside
The nature of these problems makes
that solutions are not fully automatic.
They involve use...
61
Take home from experience
Learn from concrete cases:
•  Case-based experimental approaches bring specific understanding...
62
APPLICATION
HUMANS FOR AUGMENTATION
USERSDOMAIN
DOMAIN
Augmented with
Web Semantics
USERS
Augmented with
Web Semantics
...
63
64
Technology for sense-making
65
Challenge: Making sense of Twitter
Inspired by different applications and domains,
researchers have given attention
to ...
66
Technology for making sense
The sense-making usually relies on
application and domain specific knowledge and
researcher...
67
Technology for making sense
68
Semantics for filtering and search
69
Semantics for filtering and search
In [HT2012] we considered
what is needed as first steps in processing tweets,
before...
70
1.  (Automatic) Filtering: Given an incident, how can one
automatically identify those tweets that are relevant to
the ...
71
Dataset
• Twitter corpus (TREC Microblog Track 2011)
• 16 million tweets (Jan. 24th – Feb. 8th, 2011 )
• 4,766,901 twee...
72
Filtering evaluation
!"#$%
!"&'% !"&&%
!"$'%
!"#(%
!"&)%!"$*%
!"#)%
!"&#%
!"'&%
!")#%
!"+$%
!%
!"&%
!"#%
!"$%
!"+%
!")%...
73
Filtering evaluation
The semantic strategy is more robust and
achieves higher precisions for complex topics.
1 2 3 4
nu...
74
Faceted search evaluation
!"#$%
!"&'%
!"'#%
!%
!"(%
!")%
!"'%
!"*%
+%
,-./0.1234567.8%
,62.9.8%7.6-2:%
:67:96;4567.8%
,...
75
Faceted search evaluation
Strategies with semantic enrichment outperform
those without in predicting appropriate facet-...
76
Lessons
The context: a (Twitcident-inspired) framework for
filtering, searching, and analyzing information
about incide...
77
Semantics for enrichment and linking
78
Semantics for enrichment and linkage
In [ESWC2011] we focused more on
the semantics for enrichment and linkage
to conne...
79
SI Sportsman of the
year: Surprise French
Open champ
Francesca Schiavone
Thirty in women's tennis is primordially
old, ...
80
SI Sportsman of the
year: Surprise French
Open champ
Francesca Schiavone
Thirty in women's tennis is primordially
old, ...
81
Francesca Schiavone is
sportsman of the year
#sport #tennis
Content-based
SI Sportsman of the year:
Surprise French Ope...
82
nice! http://bit.ly/eiU33c URL-based
SI Sportsman of the year:
Surprise French Open
champ
Francesca Schiavone
Thirty in...
83
Evaluation on linkage discovery
!"#!#$%
!"&!'$%
!"&'()%
!")#$$%
!")*+%
!"*!(,%
!% !"#% !"'% !"$% !"&% !"(% !"+% !")% !"...
84
Analysis on linkage discovery and
semantic enrichment
•  URL-based strategies: more than 10 tweet-news relations for c....
85
Lessons
There is good background knowledge out there,
if we are able to understand how it connects
to the domain and co...
86
Social Web for profiles
87
Challenge: Social web for profiles
An ambition often seen in conferences like this one is
to exploit the semantic enric...
88
Components for profiling
For applications such as
personalized news recommendation,
like in our [UMAP2011] work,
compon...
89
Library
GeniUS [JIST2011] is a topic and user modeling software
library that
• produces semantically meaningful profile...
90
GeniUS: Generic Topic and User Modeling Library
for the Social Semantic Web
Item
Fetcher
Enrichment
Weighting
Function
...
91
(a) hashtag-based
(b) entity-based
(c) topic-based
2. Profile
Type
1. Temporal
Constraints
3. Semantic
Enrichment
4. We...
92
User modeling with rich semantics:
interested in:
people topics events …linkage
user profile construction
#sport
person...
93
RDF Gears UI
94
RDF Gears Plugin Architecture
95
1 10 100 1000
user profiles
0
10
100
1000
10000
entitiesperuserprofile
News-based
Tweet-based
1 10 100 1000
user profiles
...
96
Lessons
For profiles, we observed:
• Semantic enrichment allows for richer user profiles.
• Profiles change over time (...
97
Social Web for augmentation
98
Augment with what is there
Systems can use technology to augment their knowledge
with data from the Social Web.
Lessons...
99
Cross-system augmentation
100
Cross-system profiles
An example to show the added value of
‘cross-system’ on the Social Web
is the work in [UMUAI 201...
101
User data on the Social Web
Cross-system user modeling on
the Social Web
102
Google	
  Profile	
  URI	
  	
  
h.p://google.com/profile/XY	
  	
  
4.	
  enrich	
  data	
  with	
  
seman?cs	
  	
  
W...
103
1.  Characteristics of distributed tag-based profiles:
•  Overlap of tag-based profiles, which an individual user crea...
104
Location estimation
Another nice example
follows from our work in the ImREAL project
on augmentation (of adaptation) w...
105
Improved location estimation by
mixing Social Web streams
+ =
external data sources:
Enriching the image’s textual met...
106
Accuracy of social web metadata
This work has also raised attention
for the accuracy of Social Web metadata.
There are...
107
Linked Open Data for augmentation
108
LOD and cross-system
With these results in hand,
in our [ICWE2012] work,
we considered cross-system modeling
with Link...
109
Johannes Vermeer
dbpedia:LouvreLooking forward to
visit Paris next week!
dbpedia:Paris
The lacemaker
The astronomer
Re...
110
c1	
  
c4	
  
c5	
  
c6	
  
weigh'ng	
  strategies	
  
Applica'on	
  
that	
  demands	
  user	
  	
  
interest	
  profi...
111
tags: girl with
pearl earring
geo: The Hague
dbpedia:Girl_with_pearl_earring
A	
  
Artifact
B	
  
The
lacemaker
C	
  
...
112
Lessons
With LOD-based user modeling on the Social Web,
different strategies for exploiting RDF-based
background knowl...
113
Interlinked online society
If you take a semantic technology perspective,
then strong interlinking could be the direct...
114
Origin of semantics
These social semantic spaces can trigger us in UMAP
to articulate where we see the role and origin...
115
Human-enhanced
116
Humans & adaptive faceted search
An important element in the process of sense-making
is its hybrid nature:
humans invo...
117
Adaptive faceted search framework
Adaptive Faceted Search
Twitter posts
Semantic Enrichment
User and Context Modeling
...
118
Facet extraction and semantic enrichment
@bob: Julian Assange got
arrested
Julian Assange
Julian Assange Tweet-based
e...
119
Impact of Link-based enrichment
Representation of
tweets:
significantly more
facets per tweet
with link-based
enrichme...
120
Faceted search strategies
Goal: most relevant facet-value pair should appear at the top
of the ranking
Faceted Search ...
121
Twitcident.com
Twitter-based crisis
management system
1.
2.
3. 4.
Semantic
enrichment
allows for:
1.  Grouping tweets
...
122
Lessons
Semantic enrichment allows for structured
representation of the content of tweets:
a good basis for faceted se...
123
Redundancy reduction
124
Duplicate detection
Important for reducing the volume of social data,
is to categorize the social chatter
and reduce r...
125
Twitter is more like a news media.
How do people search on Twitter?
[Teevan et al. 2011] has shown how this is charact...
126
Near-duplicates in Twitter search
Analysis of the Tweets2011 corpus (TREC microblog track) [WWW2013]
1.89%&
9.51%&
21....
127
Twinder Framework
Search infrastructure
Feature'Extrac+on'
'
'
'
'
'
'
Relevance(Es+ma+on(
Social(Web(Streams(
Feature...
128
Lessons
Analyzing duplicate content in Twitter, we inferred a model
for categorizing different levels of duplicity.
We...
129
Take home from technology research
With semantics and humans, Social Web can help:
•  Semantics beneficial for filteri...
130
APPLICATION
HUMANS FOR AUGMENTATION
USERSDOMAIN
DOMAIN
Augmented with
Web Semantics
USERS
Augmented with
Web Semantics...
131
Take home from technology research
The human intelligence is to be arranged differently:
•  We have moved from a prior...
132
133
Challenges with sense-making
134
Not one truth
135
In reality, not one truth
In the beginning, social systems like Twitter were used
as ‘the’ semantic source of knowledg...
136
Diversity and beliefs
[Flock et al. 2011] study the different backgrounds,
mindsets and biases of Wikipedia contributo...
137
Include the negative
Diversity of viewpoints and opinions also suggests to
include negative links in the approach.
[Sy...
138
ViewS
Modelling Viewpoints in User Generated Content
Text
processing
Viewpoint
extraction
(attention focus)
Ontology
(...
139
Viewpoints in YouTube
Examples viewpoints in user comments on job interview videos
Comparing the viewpoints around ‘an...
140
Not the truth
141
Truth is not always truth
Just like this source of knowledge is not a single one,
it is also clear that it might not b...
142
Malicious profiles
For example, profiles can be suspicious and made for the
wrong reasons.
In a context of online dati...
143
Identity theft
Another aspect to ‘wrong profiles’ relates to
identity disambiguation and theft.
[Rowe et al. 2010] con...
144
Credibility of social content
The credibility of messages in social networks is for
example studied in [Seth et al. 20...
145
Privacy
A lot can be said about privacy in these networks, for
example Facebook.
[Bachrach et al. 2012] shows how user...
146
Cultural variations
147
Cultural diversity
Studying diversity is not just relevant for understanding how
Twitter content is to be interpreted....
148
Cultural diversity
A subject addressed in ImREAL.
Components are made available as services in ImREAL
for augmented us...
149
150
Hofstede’s cultural dimensions
Describes stereotypical cultural characteristics of
nationalities, with scores relative...
151
Analysis
• Datasets
•  Microblog data collected over a period of three months
•  22 million microposts from Sina Weibo...
152
0% 20% 40% 60% 80% 100%
users
0
0.01
0.1
1
avg.numberof
hashtags/URLsperpost
Hashtag-Weibo
URL-Weibo
Hashtag-Twitter
U...
153
Semantic analysis
The topics that users discuss on Sina Weibo are to a large
extent related to locations and persons. ...
154
Sentiment analysis
Sina Weibo users have a stronger tendency to publish
positive messages than Twitter users.
0% 20% 4...
155
Combined semantic sentiment analysis
The difference is amplified when discussing ‘people’ or
‘location’, with Sina Wei...
156
Temporal analysis
Twitter users repost messages faster than Sina Weibo users.
time distance =
trepost - toriginal post...
157
Cultural differences in tagging
Other work confirms the findings.
And the consistency with theories of cultural differ...
158
Cultural variations for Social Q&A
Another example is given by [Yang et al. 2011] that looks
at cultural differences i...
159
Real-time variations
160
Understand the source
When using the knowledge from Twitter
as a semantic source,
specially if it is the only semantic...
161
Inconsistency & moods
Twitter is used as semantic sensor, sometimes as the only
semantic sensor, but consistency in us...
162
Understanding over time
While Twitter and the like were used in the beginning
as ‘fixed’ sources of knowledge,
researc...
163
Flow in disease information
Domain of diseases and outbreaks is getting fair
attention.
Works by [Gomide et al. 2011] ...
164
Flow of news
From [Naveed et al. 2011] we learn how retweets reflect
what the Twitter community considers interesting ...
165
Flow in political news
Coming back to our observation of the multiple truths,
political news is a great domain to look...
166
Temporal effects
In our [WebSci2011] work, we have considered how
user interests are manifest over time.
Most users, w...
167
Temporal effects
On Twitter the importance of entities for a topic varies
over time (long-term vs. short-term entities...
168
Twitter-based Trend and User Modeling
Framework
Twitter posts
current tweets
of Twitter
community
news
recommender?
Pr...
169
Temporal effects with trends
For the domain of personalized news recommendations,
We have combined trend and user mode...
170
Validation
171
Check with the user
With all profiles based on augmentation,
it becomes (even more) vital to follow the lessons of
che...
172
Perico
Dialogue for Modelling Cultural Exposure using Linked Data
Initial User
Model
•  Visited Countries
•  Estimated...
173
Perico
Dialogue for Modelling Cultural Exposure using Linked Data
Initial User
Model
•  Visited Countries
•  Estimated...
174
Inspect and control
[Knijnenburg et al. 2012] consider how
users of social recommender systems may want to
inspect and...
175
Communities
176
Understanding communities
Attention is given to communities and their dynamics.
[Chan et al. 2010] proposes a method f...
177
Involvement in communities
In order to understand how people behave in
Social Web and in communities,
it is relevant t...
178
Communities and expertise
Understanding communities is also relevant
as these communities can act as additional resour...
179
Take home from challenges
The (Social) Web tells many stories:
•  Acknowledge multiple truths, opposing truths, and ba...
180
181
Social, Web & UMAP
182
Social & UMAP
Huge economic and societal potential for added value.
Social Web is a fertile source of knowledge for
au...
183
Web & UMAP
UMAP systems are Web systems:
•  The (Social) Web tells many stories.
•  The (Social) Web moves fast.
•  Th...
184
UMAP & Web
On the (Social) Web, systems are being made:
•  Take positions or prepare to take positions about bad
inten...
185
UMAP & Social
In SWUMAP, human intelligence is arranged
differently:
•  From careful manual analysis a priori, to mach...
186
APPLICATION
HUMANS FOR AUGMENTATION
USERSDOMAIN
DOMAIN
Augmented with
Web Semantics
USERS
Augmented with
Web Semantics...
187
APPLICATION
HUMANS FOR AUGMENTATION
USERSDOMAIN
DOMAIN
Augmented with
Web Semantics
USERS
Augmented with
Web Semantics...
188
Thanks
Slides made with input from many,
including Alessandro, Claudia, Fabian, Ilknur, Jan,
Jasper, Ke, Qi, and Richa...
Upcoming SlideShare
Loading in...5
×

UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling and Adaptation

347
-1

Published on

Link, Like, Follow, Friend: The Social Element in User Modeling and Adaptation
by Geert-Jan Houben
TU Delft - WIS
at UMAP 2013, Rome, Italy, June 2013

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
347
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling and Adaptation

  1. 1. Delft University of Technology Link, Like, Follow, Friend: The Social Element in User Modeling and Adaptation UMAP, Rome, June, 2013 Geert-Jan Houben Web Information Systems, TU Delft
  2. 2. 2 Social Web & UMAP
  3. 3. 3 Social Web & UMAP We observe, reflect, speculate, and raise discussion about evolutions and opportunities for UMAP to make a difference. Triggered by the social element in UMAP and other conferences & our own experience in the field.
  4. 4. 4 Social Web in UMAP: a number of mentions of ‘social’, and a small number of ‘social web’ in the papers. New U (in UMAP), new users And we see more. We see how the Social Web mirrors people, mirrors users. What we learn at the Social Web, learn (more) about users and for user modeling and adaptation.
  5. 5. 5 UMAP in the new Web world What we learn at the Social Web allows us to reconsider UMAP in the Web. It brings new opportunities for us as researchers. Perhaps it brings new needs. Surely, these are opportunities that we can position within our UMAP research agenda and UMAP application portfolio.
  6. 6. 6 SWUMAP: 1 + 1 = 3 Experience shows to combine: Understanding & Creating UM & AP Machines & Humans Arrive at a body of knowledge for turning insights about users and usage into added value in society and economy.
  7. 7. 7 UMAP systems are Web systems Lessons tell us to reconsider our system concept. On the Web systems are ‘in vivo’: open and dynamic. •  Users & data are not (longer) ‘inside the system’. •  Users & data change, move (more) quickly. This impacts understanding and creating of systems. This also impacts the systems’ architecture. With the (Social) Web as our laboratory, this also impacts our research discipline.
  8. 8. 8 APPLICATION HUMANS FOR AUGMENTATION USERSDOMAIN DOMAIN Augmented with Web Semantics USERS Augmented with Web Semantics REAL DOMAIN REAL USERS
  9. 9. 9
  10. 10. 10 Inspiring domain
  11. 11. 11 Domain: Incidents and emergencies In literature we see a fair attention for the domain of incidents and emergencies. Our own experience from several years is situated in that domain. It has given us a good feeling for what is needed and how UMAP research can be part of a bigger effort to solve real-world problems.
  12. 12. 12 Domain: Incidents and emergencies In literature, most attention is directed towards understanding and detecting. Sometimes we see further objectives in responding, creating situational awareness (specially in massive incidents), and prevention. Most used in these studies is Twitter.
  13. 13. 13 2011 Tohoku eartquakes 1200 tweets/minmin
  14. 14. 14 2011 Pukkelpop storm 570 tweets/min
  15. 15. 15 Twitter With Twitter, we have a whole new reflection of what is happening in the world. A whole new source of digital data that reflects the (real) world. We need to understand that reflection to understand the world and help the world. Two challenges: 1.  Understand the world, and 2.  Understand its reflection in the Social Web.
  16. 16. 16 CrowdSense BV http://twitcident.org http://tno.nl/twitcidenthttp://twitcident.com Twitcident spin-off collaboration Our real-world lab
  17. 17. 17 400+ million tweets per day •  Netherlands ranks #1 in Twitter penetration Twitter users publish about “anything” •  Work/private life •  Interesting events •  Etc. Twitter tells us a lot about the world. And its users can be seen to act as social sensors and citizen journalists. Monitoring Twitter
  18. 18. 18 Train accident Train driver got a stroke
  19. 19. 19 Train hits a block First eyewitness
  20. 20. 20 1 min. later Eyewitness
  21. 21. 21 15 min. later Eyewitnesses
  22. 22. 22 1:17 hour later News media
  23. 23. 23 30 min. later Entertainment wrong photo
  24. 24. 24 A new source of knowledge An example of the speed and the nature of knowledge that Twitter provides and what it does to provide knowledge about what really happened. Also, it shows what we need to know and understand to use and interpret this effectively.
  25. 25. 25 1.  Early warning •  Twitter users publish early signals that might indicate an increased risk or potential incident. 2.  Crisis management •  (Eye-witness) Twitter users disseminate information about incidents which can support operational emergency services. 3.  Post evaluation •  Post analyzing incident data (in retrospect) to measure the effectiveness of emergency services. Twitcident goals
  26. 26. 26 • Emergency services •  Law enforcement, fire fighters, governments • Big event organizers •  Festival security companies • Utility organizations •  Public transport, energy supply, other vital infrastructures Stakeholders
  27. 27. 27 Pukkelpop 2011 Storm incident with casualties
  28. 28. 28 80.000 tweets in 4 hours
  29. 29. 29 570 tweets per min.
  30. 30. 30 Could we see this impact coming? Semantics 25 minutes before incident 1.  Weather: storm, cloud-burst, wind, …. 2.  Locations: Brussel, Gent, Hasselt, … 3.  Intensity: heavy, crazy, massive… 4.  Impact: hail balls, falling trees… Impact stormWhy is there a peak? “ ”
  31. 31. 31 Damage reports from incident site
  32. 32. 32 Real-time intelligence by photos
  33. 33. 33 Example festival disaster The research into this example created a lot of knowledge about what is possible and what is desired. It was also a good example to follow and approach new use cases to build more general understanding and theory.
  34. 34. 34 Dutch national rail infrastructure company Example public infrastructure
  35. 35. 35 Social Weather Map
  36. 36. 36 Twitcident processes 100k tweets/day The social weather map provides ProRail with a timely and accurate overview of citizen observations. In addition to other sources of knowledge. Value
  37. 37. 37 Big Events New Year’s Eve Serious Request Elections Lowlands Summer Carnaval Fantasy Island Queen’s Day
  38. 38. 38 Crowd Control Room
  39. 39. 39 Social media monitoring was done with 1-3 security officers Violence, riots, fires, fireworks, crowds, ..
  40. 40. 40 Not only monitoring The previous examples are not only about monitoring Twitter to know what is happening out there.
  41. 41. 41 She was about to turn 16…
  42. 42. 42 So she invited some friends …
  43. 43. 43 Which invited their friends…
  44. 44. 44 She pulled back her invitation, but…
  45. 45. 45 45
  46. 46. 46 200.000 invited, 40.000 “going”
  47. 47. 47 Atmosphere turned hostile
  48. 48. 48 Teenagers vs. police
  49. 49. 49 Alcohol & violence 49
  50. 50. 50 Massive damage at local stores 50
  51. 51. 51 Officials cannot ignore social media
  52. 52. 52 Mayor of Haren resigns after Haren-debacle
  53. 53. 53 Recommendations by Cohen •  Clear communication strategy •  Planning & organizing in advance •  Social media monitoring •  Clear intervention policy
  54. 54. 54 Why social media monitoring? Content tells what to expect
  55. 55. 55 Finding the needle
  56. 56. 56 Recommendation from experience Let us go and find the needle that tells us what appears to be happening out there But let us also think about how to support the action to make the world out there a better one.
  57. 57. 57 Meaningful and actionable Twitcident has learned us how information obtained from Twitter needs to be meaningful and actionable.
  58. 58. 58 “Polling meaningful information” “Sifting thousands of tweets during hurricane Irene” “Getting situational awareness” “Finding the eye’s on the ground” “Finding actionable information” “Providing timely reaction” “ “Volunteers are great” “But we need hybrid approaches to monitor social media” Patrick Meier Today’s challenges
  59. 59. 59 Hybrid approach Twitcident has also shown us how these problems ask for a hybrid approach with humans in the loop that handle and interpret the knowledge derived from the Social Web. Big Data is available from the Social Web, but Small Interpretations are needed, to get it right!
  60. 60. 60 Human interpretation inside The nature of these problems makes that solutions are not fully automatic. They involve users of systems that help the interpretation and decision taking. It is a special kind of users that we (as UMAP) can consider and that is fast growing and in urgent need of support.
  61. 61. 61 Take home from experience Learn from concrete cases: •  Case-based experimental approaches bring specific understanding and experience necessary for general understanding and theory. •  Cases can have great value for stakeholders. It is all about correct and actionable interpretation: •  Make information meaningful and actionable in the context. •  Employ hybrid, human-enhanced approaches for the context.
  62. 62. 62 APPLICATION HUMANS FOR AUGMENTATION USERSDOMAIN DOMAIN Augmented with Web Semantics USERS Augmented with Web Semantics REAL DOMAIN REAL USERS
  63. 63. 63
  64. 64. 64 Technology for sense-making
  65. 65. 65 Challenge: Making sense of Twitter Inspired by different applications and domains, researchers have given attention to underlying technology for making sense of Twitter. ‘Finding the needle’ as the research challenge.
  66. 66. 66 Technology for making sense The sense-making usually relies on application and domain specific knowledge and researchers investigate how to do it effectively. Semantics and interactivity prove to be important ingredients. In fact, it turns out that sense-making, i.e. finding the needle, is a combination of many things that need to be coming together.
  67. 67. 67 Technology for making sense
  68. 68. 68 Semantics for filtering and search
  69. 69. 69 Semantics for filtering and search In [HT2012] we considered what is needed as first steps in processing tweets, before we can ‘analyze’ them.
  70. 70. 70 1.  (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident? 2.  Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets? Twitter streams Challenges Filtering topic Search & Analytics information need
  71. 71. 71 Dataset • Twitter corpus (TREC Microblog Track 2011) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 ) • 4,766,901 tweets classified as English • 6.2 million entity-extractions • News (Same time period) • 62 RSS News Feeds • 13,959 News Articles • 357,559 entity-extractions
  72. 72. 72 Filtering evaluation !"#$% !"&'% !"&&% !"$'% !"#(% !"&)%!"$*% !"#)% !"&#% !"'&% !")#% !"+$% !% !"&% !"#% !"$% !"+% !")% !"'% !"(% ,-./012% 3456-7408%% ,-./012% 3456-7408%946:% ;-9<% =>06-?6@/54A/1>0% B/<-540-C% D-E9>7F%3456-7408% GHI% IJ&!% IJ$!% K-2/55% Semantic strategies outperform the keyword- based filtering regarding all metrics.
  73. 73. 73 Filtering evaluation The semantic strategy is more robust and achieves higher precisions for complex topics. 1 2 3 4 number of entities extracted from inital topic description 0 0.2 0.4 0.6 0.8 1 Precision@30andRecall Precision@30 Recall 1 2 3 4 5 number of words in the inital topic description 0 0.2 0.4 0.6 0.8 1 Precision@30andRecall Precision@30 Recall
  74. 74. 74 Faceted search evaluation !"#$% !"&'% !"'#% !% !"(% !")% !"'% !"*% +% ,-./0.1234567.8% ,62.9.8%7.6-2:% :67:96;4567.8% ,62.9.8%7.6-2:% :67:96;4567.8% <.3=>-8%7.6-2:% !"#$%&"'()*+'#,%&#$-% .!&&/%+0%1#*2"1%(1"3% with semantic enrichment without semantic enrichment The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.
  75. 75. 75 Faceted search evaluation Strategies with semantic enrichment outperform those without in predicting appropriate facet-values. 3Adaptive Faceted Search on Twitter !"#$% !"#&% !"#'% !"'(% !"#&% !")'% !"#(% !"'*% !"#+% !"#)% !",+% !"',% !% !"!+% !"'% !"'+% !",% !",+% !"#% !"#+% !")% !")+% -./0123456.7% 89 .:0.2058;.% </.=>.2?@% A30AB3C:D30.7% EF+% EF'!% GHH% with semantic enrichment without semantic enrichment
  76. 76. 76 Lessons The context: a (Twitcident-inspired) framework for filtering, searching, and analyzing information about incidents that people publish on Twitter. We have seen how to obtain • better filtering of Twitter messages for a given incident, • better search for relevant information about an incident within the filtered messages. For these first steps in processing Twitter messages, the semantic interpretation is the key element that we need to understand for the given context.
  77. 77. 77 Semantics for enrichment and linking
  78. 78. 78 Semantics for enrichment and linkage In [ESWC2011] we focused more on the semantics for enrichment and linkage to connect the tweets to background knowledge and thus enhance what we can learn from them.
  79. 79. 79 SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old, … news article topic:Sports topic:Sports topic:Tennis person:Francesca_Schiavone oc:SportsGame event:FrenchOpen francesca is becoming #sport idol of the year! microblog post user enrichment enrichment user modeling linkage Profile Topics of interest: - topic:Tennis - topic:Sports People of interest: - person:Francesca_Schiavone Events of interest: - event:FrenchOpen Example: Semantic enrichment of Twitter posts
  80. 80. 80 SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old, … news article francesca is becoming #sport idol of the year! microblog post user linkage How? Goal  in  this  linkage  discovery  is  to  iden3fy  news  resources   that  are  related  to  a  given  Twi8er  message:   1.  Web  resource  has  to  be  related  to  the  given  tweet   2.  Web  resource  has  to  be  related  to  news     Linkage discovery
  81. 81. 81 Francesca Schiavone is sportsman of the year #sport #tennis Content-based SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old… Francesca Schiavone is sportsman of the year #sport #tennis Hashtag-based Petkovic & Goerges leading German tennis revival there are signs that German tennis is… The image cannot be displayed. Linkage discovery strategies
  82. 82. 82 nice! http://bit.ly/eiU33c URL-based SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old… news article URL Entity-based Olympic champion and world number nine Elena Dementieva announced her retirement The 29-year-old Russian delivered the shock news after losing to Francesca Schiavone in the group stages of the season-ending tournamen … news article Entity-based Francesca Schiavone is sportsman of the year #sport #tennis temporal constraint Old news L publish date publish date •  URL-based (Strict): only consider content of the Twitter message •  URL-based (Lenient): also consider reply or re-tweet messages Linkage discovery strategies
  83. 83. 83 Evaluation on linkage discovery !"#!#$% !"&!'$% !"&'()% !")#$$% !")*+% !"*!(,% !% !"#% !"'% !"$% !"&% !"(% !"+% !")% !"*% !",% -./01/0234516%78492.:2;.<65=% >45?049234516% @/A0B234516%7CD0?.E0%01FG.<4H%I./50<4D/05=% @/A0B234516% JKL234516%7H1/D1/0=% JKL234516%750<DI0=% !"#$%&%'() URL-based strategies offer good linkage.
  84. 84. 84 Analysis on linkage discovery and semantic enrichment •  URL-based strategies: more than 10 tweet-news relations for c.a. more than 1000 •  Entity-based strategy: found a far more higher number of tweet-news relations •  Hashtag-based strategy failed for more than 79% of the users because of the limited usage of hashtags •  Combination of all strategies: higher than 10 tweet-news relation found for more than 20% of the users Entity-based URL-based Hashtag-based Combination Combined strategies perform better.
  85. 85. 85 Lessons There is good background knowledge out there, if we are able to understand how it connects to the domain and context we are considering. Many applications can share the same enrichment and linking, but not all. With common descriptions of the problem, we can share enrichment and linking (more) effectively.
  86. 86. 86 Social Web for profiles
  87. 87. 87 Challenge: Social web for profiles An ambition often seen in conferences like this one is to exploit the semantic enriched social web knowledge for the purpose of creating or enhancing user profiles. These profiles can then be used for adaptation and personalization.
  88. 88. 88 Components for profiling For applications such as personalized news recommendation, like in our [UMAP2011] work, components for profiling can be carefully selected and assembled. It can also help the development of the deeper understanding and theory about how to link the data to background knowledge and thus make sense of the data.
  89. 89. 89 Library GeniUS [JIST2011] is a topic and user modeling software library that • produces semantically meaningful profiles, to enhance the interoperability of profiles between applications; • provides functionality for aggregating relevant information about a user from the Social Web; • generates domain-specific user profiles according to the information needs of different applications; • is flexible and extensible to serve different applications.
  90. 90. 90 GeniUS: Generic Topic and User Modeling Library for the Social Semantic Web Item Fetcher Enrichment Weighting Function RDF Repository Filter Modeling Configuration RDF Serialization Social Web Semantic Web user data items enriched items semantic data user profiles interested in: locationproduct
  91. 91. 91 (a) hashtag-based (b) entity-based (c) topic-based 2. Profile Type 1. Temporal Constraints 3. Semantic Enrichment 4. Weighting Scheme (a) time period (b) temporal patterns (a) tweet-based (b) further enrichment (a) concept frequency User Modeling Building Blocks
  92. 92. 92 User modeling with rich semantics: interested in: people topics events …linkage user profile construction #sport person:Francesca_Schiavone topic:Sports event:FrenchOpen topic:Tennis time weekday weekend Profile types • hashtag- based • topic-based • entity-based enrichment • tweet-only • exploitation of external news resources temporal patterns • specific time period • temporal pattern • No constrains User profile construction
  93. 93. 93 RDF Gears UI
  94. 94. 94 RDF Gears Plugin Architecture
  95. 95. 95 1 10 100 1000 user profiles 0 10 100 1000 10000 entitiesperuserprofile News-based Tweet-based 1 10 100 1000 user profiles 0 10 distincttopicsperuserprofile News-based Tweet-based Entity-based profiles Topic-based profiles profiles enriched with external news resource profiles enriched with external news resource By exploiting the linkage between tweets and news articles, we get more distinct entities / topics (semantics)! Richer semantics through linking strategies. Analysis of profile characteristics
  96. 96. 96 Lessons For profiles, we observed: • Semantic enrichment allows for richer user profiles. • Profiles change over time (hashtag-based more): fresh profiles seem to better reflect current user demands. • Temporal patterns: weekend profiles differ significantly form weekday profiles (more than day/night). For personalized news recommendation, we learned: • Best user modeling strategy: Entity-based > topic-based > hashtag-based. • Semantic enrichment improves recommendation quality. • Adapting to temporal context helps for topic-based strategy.
  97. 97. 97 Social Web for augmentation
  98. 98. 98 Augment with what is there Systems can use technology to augment their knowledge with data from the Social Web. Lessons learned show that for adaptive systems on the Social Web there is a lot of knowledge (easily) available, from other systems and other domains. Understanding how to leverage it, even to a basic level, can bring a lot.
  99. 99. 99 Cross-system augmentation
  100. 100. 100 Cross-system profiles An example to show the added value of ‘cross-system’ on the Social Web is the work in [UMUAI 2013] where interweaving of public profiles is studied.
  101. 101. 101 User data on the Social Web Cross-system user modeling on the Social Web
  102. 102. 102 Google  Profile  URI     h.p://google.com/profile/XY     4.  enrich  data  with   seman?cs     WordNet®   Seman'c   Enhancement   Profile   Alignment   3.  Map  profiles  to   target  user  model   FOAF   vCard   Blog  posts:   Bookmarks:   Other  media:   Social  networking  profiles:   2.  aggregate     public  profile     data     Social  Web   Aggregator   1.  get  other  accounts     of  user     SocialGraph  API   Account   Mapping   Aggregated,     enriched  profile   (e.g.,  in  RDF  or  vCard)   Analysis  and  user   modeling   5.  generate  user   profiles   Interweaving public user data with Mypes
  103. 103. 103 1.  Characteristics of distributed tag-based profiles: •  Overlap of tag-based profiles, which an individual user creates at different services, is low •  Aggregated profiles reveal significantly more information (regarding entropy) than service-specific profiles 2.  Performance of cross-system user modeling for cold- start recommendations: •  Cross-system UM leads to tremendous (and significant) improvements of the tag and bookmark recommendation quality •  To optimize the performance one has to adapt the cross-system strategies to the concrete application setting http://persweb.org Lessons
  104. 104. 104 Location estimation Another nice example follows from our work in the ImREAL project on augmentation (of adaptation) with the Social Web.
  105. 105. 105 Improved location estimation by mixing Social Web streams + = external data sources: Enriching the image’s textual meta-data with the user’s tweets improves the accuracy of the location estimation.
  106. 106. 106 Accuracy of social web metadata This work has also raised attention for the accuracy of Social Web metadata. There are many reasons why this data cannot be taken as the universal truth. In application and domain specific contexts, we need to understand the accuracy of social metadata. Also, the work of [Rout et al. 2013] on location estimation based on social ties, shows the feasibility as well as the context-dependency.
  107. 107. 107 Linked Open Data for augmentation
  108. 108. 108 LOD and cross-system With these results in hand, in our [ICWE2012] work, we considered cross-system modeling with Linked Open Data. With the aim to understand how Linked Open Data background knowledge can be leveraged for cross-system and cross-domain augmentation.
  109. 109. 109 Johannes Vermeer dbpedia:LouvreLooking forward to visit Paris next week! dbpedia:Paris The lacemaker The astronomer Recommending Points of Interest
  110. 110. 110 c1   c4   c5   c6   weigh'ng  strategies   Applica'on   that  demands  user     interest  profile  regarding                    -­‐concepts   c2   c3   cx   cy   c9   User  Profile   concept      weight   0.4   0.1   0.2   c1   c2   c3   …  …   concepts  that  can  be  extracted   from  the  user  data     user  data   Social  Web   background  knowledge     (graph  structures)   Linked  Data   LOD-based User Modeling
  111. 111. 111 tags: girl with pearl earring geo: The Hague dbpedia:Girl_with_pearl_earring A   Artifact B   The lacemaker C   The astronomer …   rdf:type Johannes Vermeer foaf:maker foaf:maker Strategies for exploiting the RDF-based background knowledge graph dbpedia:The_Hague dbpedia:Louvre dbpprop:locationlocatedIn
  112. 112. 112 Lessons With LOD-based user modeling on the Social Web, different strategies for exploiting RDF-based background knowledge are possible. Findings: • Combination of different user data sources (Flickr & Twitter) is beneficial for the user modeling performance. • User modeling quality increases the more background knowledge one considers. • Combination of strategies achieves the best performance. To investigate further: dependency of strategies of entities and relationships, and temporal effects (eg temporal relationships or upcoming trends).
  113. 113. 113 Interlinked online society If you take a semantic technology perspective, then strong interlinking could be the direction to go. [Passant et al. 2009] studies applying semantic technologies to social media, creating a Web where data is socially created and maintained through end-user interactions, but is also machine-readable and therefore open towards sophisticated queries and large-scale information integration. "Social Semantic Information Spaces”, where any social data is a component in a worldwide collective intelligence ecosystem.
  114. 114. 114 Origin of semantics These social semantic spaces can trigger us in UMAP to articulate where we see the role and origin of semantics. Making all social data available ‘with semantics’ or observing that a lot of semantics is (only) effective in a specific domain or application? Experience showing the fine-grained nature of effects suggests the latter.
  115. 115. 115 Human-enhanced
  116. 116. 116 Humans & adaptive faceted search An important element in the process of sense-making is its hybrid nature: humans involved in the sense-making. The control rooms have shown us that the human aspect in search is crucial, for judgment and interpretation. In our [ISWC2011] work, we looked at adaptive faceted search.
  117. 117. 117 Adaptive faceted search framework Adaptive Faceted Search Twitter posts Semantic Enrichment User and Context Modeling user How to adapt the facet-value pair ranking to the current demands of the user? How to represent the content of a tweet?  facet extraction
  118. 118. 118 Facet extraction and semantic enrichment @bob: Julian Assange got arrested Julian Assange Julian Assange Tweet-based enrichment Julian Assange arrested Julian Assange, the founder of WikiLeaks, is under arrest in London… Link-based enrichment Julian Assange London WikiLeaks Julian Assange Julian Assange London WikiLeaks powered by
  119. 119. 119 Impact of Link-based enrichment Representation of tweets: significantly more facets per tweet with link-based enrichment
  120. 120. 120 Faceted search strategies Goal: most relevant facet-value pair should appear at the top of the ranking Faceted Search Strategies: 1.  Occurrence frequency: count occurrence frequencies of FVP 2.  Personalization: adapt ranking to user profile (eg user tweeting history) 3.  Diversification: increase variety among the top-ranked FVPs 4.  Time-sensitivity: adapt FVP ranking to temporal context Semantic enrichment: (i) tweet-based and (ii) link-based enrichment Locations 1.  Aachen 2.  Aalborg 3.  Aalesund 4.  Aarhus … 2145. Eindhoven Locations 1.  Eindhoven 2.  Delft 3.  Amsterdam 4.  Rotterdam 5.  London … Link-based enrichment and occurrence-based and personalized rankings have large effect.
  121. 121. 121 Twitcident.com Twitter-based crisis management system 1. 2. 3. 4. Semantic enrichment allows for: 1.  Grouping tweets into incidents 2.  Faceted search 3.  Thematic Views 4.  Analysis
  122. 122. 122 Lessons Semantic enrichment allows for structured representation of the content of tweets: a good basis for faceted search. Faceted search performs significantly better than hashtag-based keyword search Different building blocks for making faceted search on Twitter adaptive improve the search quality: •  Link-based enrichment: more discoverable tweets, better search performance. •  Personalization leads to significant improvements. •  Time-sensitivity improves performance as well.
  123. 123. 123 Redundancy reduction
  124. 124. 124 Duplicate detection Important for reducing the volume of social data, is to categorize the social chatter and reduce redundancy in information. In our [WWW2013] work we have considered duplicate detection.
  125. 125. 125 Twitter is more like a news media. How do people search on Twitter? [Teevan et al. 2011] has shown how this is characterized by repeated queries & monitoring for new content. Problems: •  Short tweets è lots of similar information. •  Few people produce contents è many retweets, copied content. Search and retrieval on Twitter
  126. 126. 126 Near-duplicates in Twitter search Analysis of the Tweets2011 corpus (TREC microblog track) [WWW2013] 1.89%& 9.51%& 21.09%& 48.71%& 18.80%& Exact&copy& Nearly&exact& copy& Strong&near; duplicate& Weak&near; duplicate& Low&overlapping& •  For the 49 topics (queries), 2,825 topic-tweet pairs are relevant. •  We manually labeled 55,362 tweet pairs •  We found 2,745 pairs of duplicates in different levels.
  127. 127. 127 Twinder Framework Search infrastructure Feature'Extrac+on' ' ' ' ' ' ' Relevance(Es+ma+on( Social(Web(Streams( Feature(Extrac+on(Task( Broker( Cloud Computing Infrastructure Index( Keyword?based( Relevance( messages Twinder Search Engine feature extraction tasks Search(User(Interface( query results feedback users Duplicate'Detec+on'and'Diversifica+on' Seman+c?based( Relevance( Seman+c(Features(Syntac+cal(Features( Contextual(Features( Further(Enrichment(
  128. 128. 128 Lessons Analyzing duplicate content in Twitter, we inferred a model for categorizing different levels of duplicity. We developed a near-duplicate detection framework for microposts and for categorizing duplicity of tweet pairs. Given the duplicate detection framework, we perform extensive evaluations and analyses of different duplicate detection strategies. Our approach enables search result diversification, also good to avoid ‘bubble effects’, and analyzes the impact of the diversification on the search quality. Follow Twinder progress: http://wis.ewi.tudelft.nl/twinder/
  129. 129. 129 Take home from technology research With semantics and humans, Social Web can help: •  Semantics beneficial for filtering & search and enrichment & linking. •  Semantic-enriched tweets beneficial for profiles and adaptation. •  Social Web & Linked Data beneficial for cross-system augmentation. •  Adaptive faceted search and duplicate detection beneficial for human- enhanced processing. For adaptive systems that rely on profiling, Social Web is a fertile source for more knowledge. ImREAL research & experiences elegantly show principles, as well as the detailed work in domain & application: •  Social Web & LOD usage is context-specific. •  Big Data in need of Small Interpretations.
  130. 130. 130 APPLICATION HUMANS FOR AUGMENTATION USERSDOMAIN DOMAIN Augmented with Web Semantics USERS Augmented with Web Semantics REAL DOMAIN REAL USERS
  131. 131. 131 Take home from technology research The human intelligence is to be arranged differently: •  We have moved from a priori understanding the system, to on the fly understanding the system. •  We have moved from careful manual analysis before, to machines doing the analysis on the fly. •  The critical and context-specific approach to (small) data, about domain and users, is a part of process and system we now need to (re-)include. •  This task of the designer has now shifted to a task for the human interpretation inside the hybrid system: human monitoring inside.
  132. 132. 132
  133. 133. 133 Challenges with sense-making
  134. 134. 134 Not one truth
  135. 135. 135 In reality, not one truth In the beginning, social systems like Twitter were used as ‘the’ semantic source of knowledge with an implicit assumption that Twitter is one voice. Over time, researchers have begun to investigate how to identify and interpret different voices and viewpoints in such a source. Differences in viewpoints and opinions are subject of study, but until now leverage is limited
  136. 136. 136 Diversity and beliefs [Flock et al. 2011] study the different backgrounds, mindsets and biases of Wikipedia contributors, to understand the effects - positive and negative – of this diversity on the quality of the Wikipedia content, and on the sustainability of the overall project. • Analysis and approach for diversity-minded content management within Wikipedia. [Bhattachanya et al. 2012] estimate beliefs from posts made on social media, to monitor the level of belief, disbelief and doubt related to specific propositions.
  137. 137. 137 Include the negative Diversity of viewpoints and opinions also suggests to include negative links in the approach. [Symeonidis et al. 2010] give an example of how to include negative links into friend recommendation approaches, but this goes much further. The effect they observe on improving accuracy can be held as a principle where accuracy improvement can be gained using information about positive and negative edges.
  138. 138. 138 ViewS Modelling Viewpoints in User Generated Content Text processing Viewpoint extraction (attention focus) Ontology (activity aspects to analyse) Semantic enrichment Viewpoint exploration
  139. 139. 139 Viewpoints in YouTube Examples viewpoints in user comments on job interview videos Comparing the viewpoints around ‘anger’ of young users (left) and old users (right)
  140. 140. 140 Not the truth
  141. 141. 141 Truth is not always truth Just like this source of knowledge is not a single one, it is also clear that it might not be consisting of ‘true’ knowledge alone.
  142. 142. 142 Malicious profiles For example, profiles can be suspicious and made for the wrong reasons. In a context of online dating, [Pizzato et al. 2012] have observed the need to gain understanding of the sensitivity of recommender algorithms to scammers. With people being the items to recommend, fraudulent profiles can be having a serious impact on recommender algorithms. Identifying and detecting fraudulent profiles is a new challenge for us.
  143. 143. 143 Identity theft Another aspect to ‘wrong profiles’ relates to identity disambiguation and theft. [Rowe et al. 2010] consider malevolent web practices such as identity theft and lateral surveillance. They study techniques for web users to identify all web resources which cite them and if necessary, remove the sensitive information.
  144. 144. 144 Credibility of social content The credibility of messages in social networks is for example studied in [Seth et al. 2010] on stories from Digg. Their model is based on theories developed in sociology, political science and information science. [Cramer et al. 2008] have nicely brought attention for trust. The study of social content credibility and trust are important, and ask for cross-discipline effort.
  145. 145. 145 Privacy A lot can be said about privacy in these networks, for example Facebook. [Bachrach et al. 2012] shows how users’ activity on Facebook (related to privacy) relates to their personality, as measured by the standard Five Factor Model. Nice example of understanding how Facebook features relate to interesting aspects of users and usage.
  146. 146. 146 Cultural variations
  147. 147. 147 Cultural diversity Studying diversity is not just relevant for understanding how Twitter content is to be interpreted. It is also relevant for understanding how the Social Web is used and can be used with a purpose. Cultural diversity is here one of the most interesting aspects and perhaps also one of the most challenging ones.
  148. 148. 148 Cultural diversity A subject addressed in ImREAL. Components are made available as services in ImREAL for augmented user modeling, e.g. for simulation designers.
  149. 149. 149
  150. 150. 150 Hofstede’s cultural dimensions Describes stereotypical cultural characteristics of nationalities, with scores relative to other nationalities Five core dimensions: •  Individualism versus Collectivism (IDV) •  Power Distance (PDI) •  Masculinity versus Femininity (MAS) •  Uncertainty Avoidance (UAI) •  Long-Term Orientation (LTO) geert-hofstede.com
  151. 151. 151 Analysis • Datasets •  Microblog data collected over a period of three months •  22 million microposts from Sina Weibo and 24m from Twitter •  a sample of 2616 Sina Weibo users and 1200 Twitter users • Analyze and compare user behavior •  on two levels (i) the entire user population and (ii) individual users •  from different angles (i) syntactic, (ii) semantic, (iii) sentiment and (iv) temporal analysis
  152. 152. 152 0% 20% 40% 60% 80% 100% users 0 0.01 0.1 1 avg.numberof hashtags/URLsperpost Hashtag-Weibo URL-Weibo Hashtag-Twitter URL-Twitter Hashtags and URLs are less frequently applied on Sina Weibo than on Twitter. Users on Twitter are more triggered by hashtags and URLs when propagating information than on Sina Weibo. Syntactic analysis high collectivism in Weibo, a high individualism in Twitter
  153. 153. 153 Semantic analysis The topics that users discuss on Sina Weibo are to a large extent related to locations and persons. In contrast to Sina Weibo, users on Twitter are talking more about organizations (such as companies, political parties). 0% 20% 40% 60% 80% 100% users 0 0.001 0.01 0.1 1 10 avg.numberofentitiesperpost Weibo Twitter low employee commitment to an organization in China - high long term orientation.
  154. 154. 154 Sentiment analysis Sina Weibo users have a stronger tendency to publish positive messages than Twitter users. 0% 20% 40% 60% 80% 100% users 0% 20% 40% 60% 80% 100% ratioofpositveposts Weibo Twitter more negative posts more positive posts high long term orientation.
  155. 155. 155 Combined semantic sentiment analysis The difference is amplified when discussing ‘people’ or ‘location’, with Sina Weibo users even more positive and Twitter users more negative. more longterm orientation in Weibo, more shortterm orientation in Twitter
  156. 156. 156 Temporal analysis Twitter users repost messages faster than Sina Weibo users. time distance = trepost - toriginal post 0% 20% 40% 60% 80% 100% users 0 0.1 1 10 100 1000 timedistance(inhours) Weibo Twitter large degree of power distance in Weibo, small one in Twitter
  157. 157. 157 Cultural differences in tagging Other work confirms the findings. And the consistency with theories of cultural differences between Asian and Western cultures. [Dong et al. 2011] look at cultural differences in a tagging system and find that American and Chinese subjects differed in many ways: • the number and types of tags they applied; • the extent to which they applied suggested tags or entered new tags of their own; and • how often they applied tags that originated from a different culture.
  158. 158. 158 Cultural variations for Social Q&A Another example is given by [Yang et al. 2011] that looks at cultural differences in people’s social question asking behaviors across the United States, the United Kingdom, China, and India. They analyzed the questions people ask via social networking tools, and their motivations for asking and answering questions online. Results reveal culture as a consistently significant factor in predicting people’s social question and answer behavior.
  159. 159. 159 Real-time variations
  160. 160. 160 Understand the source When using the knowledge from Twitter as a semantic source, specially if it is the only semantic source, there are a few things one needs to consider that relate to the real-time nature of social contributions. The ‘knowledge’ is not unambiguous: inconsistency, moods, etc. Real-time knowledge spreads and evolves fast.
  161. 161. 161 Inconsistency & moods Twitter is used as semantic sensor, sometimes as the only semantic sensor, but consistency in user contributions like ratings is a concern. [Said et al. 2012] shows how users are inconsistent in their ratings and tend to be more consistent for above average ratings. [De Choudhury et al. 2012] report on the relation between moods and social activity, social relations and participatory patterns like link sharing and conversational engagement.
  162. 162. 162 Understanding over time While Twitter and the like were used in the beginning as ‘fixed’ sources of knowledge, researchers have become interested in the evolution over time. The nature and speed of the flow of content over time have become great objects of study. Two domains that in this light have received fair attention is that of diseases and (political) news.
  163. 163. 163 Flow in disease information Domain of diseases and outbreaks is getting fair attention. Works by [Gomide et al. 2011] on Dengue and [Diaz-Aviles et al. 2012] on EHEC, show how the people’s behavior on Twitter can be used for surveillance and tasks such as early warning and outbreak investigation.
  164. 164. 164 Flow of news From [Naveed et al. 2011] we learn how retweets reflect what the Twitter community considers interesting on a global scale. In [Backstrom et al. 2011] we see the differences between communication and observation in Facebook: communication involves a much higher focus of attention than observation activities. We see in [Lerman et al. 2010] how network structure affects dynamics of how interest in news stories spreads among social networks in Digg and Twitter
  165. 165. 165 Flow in political news Coming back to our observation of the multiple truths, political news is a great domain to look at. For the contact of political speech, [Metaxas et al. 2010] discuss how the real-time nature of Twitter provides disproportionate exposure to personal opinions, fabricated content, unverified events, lies and misrepresentations, with viral spread as a consequence. To act upon that, [Lumezanu et al. 2012] identify extreme tweeting patterns that could characterize users who spread propaganda (political propagandists), e.g. sending high volumes of near-duplicate messages.
  166. 166. 166 Temporal effects In our [WebSci2011] work, we have considered how user interests are manifest over time. Most users, who are interested into the news topic, become interested within a few days. Lifespan of users’ interest: • Long-term adopters - continuously interested • Short-term adopters - interested only for a short period in time (and influenced by “global trends”) High overlap between early adopters and long-term adopters.
  167. 167. 167 Temporal effects On Twitter the importance of entities for a topic varies over time (long-term vs. short-term entities). In terms of user interests over time, the majority of users becomes quickly (few days) interested in a topic. When using Twitter-based profiles for personalization, time-sensitive user modeling improves recommendation quality. Also, the selection of user modeling strategy should take the type of user into account: • Long-term adopters: hashtag-based • Short-term adopters: entity-based
  168. 168. 168 Twitter-based Trend and User Modeling Framework Twitter posts current tweets of Twitter community news recommender? Profile Semantic Enrichment Profile Type Aggregation Weighting Scheme trends time user’s interests
  169. 169. 169 Temporal effects with trends For the domain of personalized news recommendations, We have combined trend and user modeling in our framework. • We have seen how user profiles change over time, under the influence of trends. • Appropriate concept weighting strategies allow for the discovery of local trends. • Time sensitive weighting function is best for generating trend profiles. Aggregation of trend and user profile can improve the performance of recommendations.
  170. 170. 170 Validation
  171. 171. 171 Check with the user With all profiles based on augmentation, it becomes (even more) vital to follow the lessons of checking with the user. By engaging with the user in a common process of validating the profile and the assumptions based on it.
  172. 172. 172 Perico Dialogue for Modelling Cultural Exposure using Linked Data Initial User Model •  Visited Countries •  Estimated Cultural Exposure Social Web Sensors Perico Dialogue Agent Cultural Fact Extractor Quiz Generator User Profile GeneratorDialogue Planner Updated User Model •  Verified Visited Countries •  Enhanced Cultural Exposure Score
  173. 173. 173 Perico Dialogue for Modelling Cultural Exposure using Linked Data Initial User Model •  Visited Countries •  Estimated Cultural Exposure Social Web Sensors Perico Dialogue Agent Cultural Fact Extractor Quiz Generator User Profile GeneratorDialogue Planner Updated User Model •  Verified Visited Countries •  Enhanced Cultural Exposure Score
  174. 174. 174 Inspect and control [Knijnenburg et al. 2012] consider how users of social recommender systems may want to inspect and control how their social relationships influence the recommendations they receive: friends are not always “nearest neighbors”. The results show that high inspectability and control indeed increase users’ perceived understanding of and control over the system, their rating of the recommendation quality, and their satisfaction with the system, and thus an overall better user experience.
  175. 175. 175 Communities
  176. 176. 176 Understanding communities Attention is given to communities and their dynamics. [Chan et al. 2010] proposes a method for analysing user communication roles in discussion forums. [Schwagereit et al. 2011] study governance in web communities. [Karnstedt et al. 2011] considers the relation between a user's value within a community - constituted from various user features - and the probability of a user churning. [Yang et al. 2010] analyze users’ activity lifespan in online knowledge sharing communities: acknowledgement of contributions leads to user survival.
  177. 177. 177 Involvement in communities In order to understand how people behave in Social Web and in communities, it is relevant to understand their engagement and involvement in more detail. [Lehmann et al. 2012] study how users engage with online services, and how to measure this engagement. [Freyne et al. 2009] look at how social networking sites rely on the contribution and participation of their members: focus on early interventions for engagement.
  178. 178. 178 Communities and expertise Understanding communities is also relevant as these communities can act as additional resource. From finding evidence for profiles, we have seen recent attention shift towards finding people and expertise. For example, to enable active engagement of people. For using expertise in UMAP, it is also important to be able to specify expertise, to enable reasoning about the expertise’s quality and fit.
  179. 179. 179 Take home from challenges The (Social) Web tells many stories: •  Acknowledge multiple truths, opposing truths, and bad intentions. •  Acknowledge multiple audiences and viewpoints. •  Acknowledge cultural variations. The (Social) Web moves fast: •  Acknowledge the real-time nature of Web and applications. •  Analyze and understand the flow of information. •  Analyze and understand the nature of communities. The (Social) Web includes people: •  Involve the users actively in validation. •  Involve (communities of) users in interpretation.
  180. 180. 180
  181. 181. 181 Social, Web & UMAP
  182. 182. 182 Social & UMAP Huge economic and societal potential for added value. Social Web is a fertile source of knowledge for augmentation. •  Semantics can be beneficial for social-based augmentation. •  Hybrid, human-enhanced approaches can be beneficial. •  Technological feasibility of augmentation. Research from specific cases towards general theory. Next on the agenda: •  Describe added value for stakeholders, describe goals. •  Share and compare research challenges and evaluations.
  183. 183. 183 Web & UMAP UMAP systems are Web systems: •  The (Social) Web tells many stories. •  The (Social) Web moves fast. •  The (Social) Web includes people. The Web is the real laboratory for UMAP systems. Next on the agenda: •  Share and compare solutions, components, and systems. •  Support more uniformity in methods and practices.
  184. 184. 184 UMAP & Web On the (Social) Web, systems are being made: •  Take positions or prepare to take positions about bad intentions. •  Take responsibility and recommend about future architectures. On the (Social) Web, many systems are small: •  Do (also) consider the specific problems of small and medium sized stakeholders: bring UMAP into practice.
  185. 185. 185 UMAP & Social In SWUMAP, human intelligence is arranged differently: •  From careful manual analysis a priori, to machine analysis on the fly. •  Critical and context-specific approach to data is part of the ‘in vivo’ system. •  Human interpretation of data is inside the hybrid system. It makes for a new type of system, and one of great value. And plenty of fun and diverse challenges for UMAP.
  186. 186. 186 APPLICATION HUMANS FOR AUGMENTATION USERSDOMAIN DOMAIN Augmented with Web Semantics USERS Augmented with Web Semantics REAL DOMAIN REAL USERS
  187. 187. 187 APPLICATION HUMANS FOR AUGMENTATION USERSDOMAIN DOMAIN Augmented with Web Semantics USERS Augmented with Web Semantics SWUMAP
  188. 188. 188 Thanks Slides made with input from many, including Alessandro, Claudia, Fabian, Ilknur, Jan, Jasper, Ke, Qi, and Richard from WIS in Delft, and friends from ImREAL, Net2, SEALINCMedia, and Twitcident.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×