ESSIR 2013 - IR and Social Media

9th European Summer School in Information Retrieval September 4th, 2013
http://bit.ly/ESSIR13IRSocMedia
IR and Social Media
Arjen P. de Vries
arjen@acm.org
Centrum Wiskunde & Informatica
Delft University of Technology
Spinque B.V.

On slideshare,
IR = Investor Relations

Social Media
Noun
social media (plural only)
Interactive forms of media that allow users
to interact with and publish to each other,
generally by means of the Internet.
The early 21st century saw a huge increase in social
media thanks to the widespread availability of the
Internet.

http://www.webanalyticsworld.net/2010/11/history-of-social-media-infographic.html

Social Media
 “Social bookmarking” sites
 “User generated content”
 Images (flickr) and videos (youtube, vimeo), but also
blogs
 Social network services
 Twitter, facebook

“Rock group” in
author’s metadata...
Organisation in
groups may help
disambiguate
query!
More implicit
metadata...

Information Science
“Search for the fundamental knowledge
which will allow us to postulate and utilize
the most efficient combination of [human
and machine] resources”
 M.E. Senko. Information systems: records, relations, sets, entities,
and things. Information systems, 1(1):3–13, 1975.

Core Questions
 How to represent information?
 The information need and search requests
 The objects to be shown in response to an
information request
 How to match information
representations?

IR and Social Media
 Richer information representations!

Richer representations
 User profiles
 User name, full name, description, image,
homepage url, etc.
 Connections between users
 Networks of friends, followers, etc
 Comments/reactions
 Endorsing and sharing

(C) 2008, The New York Times Company
Anchor tekst:
“continue reading”

Not a lot of info
to represent
the page…
Een fan’s hyves page:
Kyteman's HipHop Orchestra: www.kyteman.com
Kaartverkoop luxor theater:
22 mei - Kyteman's hiphop Orkest - www.kyteman.com
Kluun.nl:
De site van Kyteman
Blog Rockin’ Beats:
De 21-jarige Kyteman
(trompettist, componist en
Producer Colin Benders),
heeft drie jaar gewerkt aan
zijn debuut:
the Hermit sessions.
Jazzenzo:
...een optreden van het populaire
Kyteman’s Hiphop Orkest

‘Co-creation’
 Social Media:
 Consumer becomes a co-creator
 ‘Data consumption’ traces
 In essence: many new sources to play the
role of anchor text
 Tags and/or ratings
 Tweets
 Comments, reviews

Potential Benefits for IR
 Expand content representation
 Reduce the vocabulary gap(s) between
creators of content, indexers, and users
 More diverse views on the same content

 Relevance depends on user context
 User task
 User knowledge

 Relevance depends on user context
 User task
 User knowledge
 Social media provide an opportunity to
make much better assumptions about
user context
 A specific user’s context
 The variety of user contexts that may exist

Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders.
The task dependent effect of tags and ratings on social media access.
TOIS 28, 4, article 21 (November 2010), 42 pages.

LibraryThing
 Items
 People
 Tags
 Ratings
See also: http://www.macle.nl/tud/LT/

Examples
 Humour
 Classic

Search with Random Walk
 Present nodes according to estimated
probability that a random walk that starts
from (task dependent) starting nodes,
would end at this node
 E.g., tag suggestion starts in a tag node;
personalized search in tag and user nodes

Ratings
 Ratings may enhance the graph, or just
be used for evaluation

Personalized Search
 Assume a user who types a single tag as
query

 A soft clustering effect smoothly relates
similar concepts before converging to the
background probability

 Homographs like “Java” are
disambiguated because the walk starts in
both the query tag and the target user
 So, content that matches the user’s
preference is more likely to be found first

Analysis results
 Allowing all users to tag all available
content improves retrieval tasks
 Combining tags and ratings may improve
both search and recommendation tasks

Ternary relation lost!
 The UIT matrix represents a ternary
relation, that is lost when creating the
three UI, IT and UT matrices

Ternary relation lost!
 The UIT matrix represents a ternary
relation, that is lost when creating the
three UI, IT and UT matrices
 Potentially a problem if tags express opinion
about an item; e.g.,
 “poetry” can independent from item still describe
the user
 “awful” requires to know what item the term
belongs to

Tags vs. rating
 Most tags do not deviate far from the
mean rating
 Only few tags strongly correlated with
opinion
 Note: poetry higher quality than chicklit

Metadata
 Scientific articles have many types of
metadata associated:
 Abstract
 Author
 Booktitle
 Description
 Journal
 Tags
 Are all these types of metadata useful for
item recommendation?

Metadata
 According to Toine Bogers’ PhD thesis:
 Concatenate all fields associated to a single
user’s profile’s items into one huge text field,
and use an off-the-shelf IR model to match
the profile against metadata of the items.
“Profile-centric Matching”
 Or, construct item profiles from meta-data of
all users for that item, and apply an item-
based collaborative filtering approach
“Item-based Hybrid Filtering”
 Author, description, tags, title, url, journal
and booktitle all contribute

Artist Popularity?
 Let’s ask widely used social media music
platforms!
 I.e., query their APIs

Artist Popularity (1-3)
 Top-5 popular artists in dataset
 Jan 21 – Mar 21
 3 hourly timestamped popularity indices

http://bit.ly/ESSIR13IRSocMedia

Artist Popularity (?!)
 Top-5 popular artists in dataset
 Jan 21 – Mar 21
 3 hourly timestamped popularity indices

The Black Keys
 Three grammy awards received!

The Black Keys
 Web responds, while service based
popularity index is static

Implications
 An “artist popularity” index depends on
the platform and its user population
 Web based popularity – estimated via URL
shortener’s API – “reacts” to real-world
events
 Suitable as an academics’ search log
replacement?

Implications
 An “artist popularity” index depends on
the platform and its user population
 Web based popularity – estimated via URL
shortener’s API – “reacts” to real-world
events
 Suitable as an academics’ search log
replacement?
 Q: What is the most useful popularity –
one that changes dynamically or one that
lasts?

Tweets about blip.tv
 “Twanchor text”
 E.g.: http://blip.tv/file/2168377
 Amazing
 Watching “World’s most realistic 3D city
models?”
 Google Earth/Maps killer
 Ludvig Emgard shows how maps/satellite pics
on web is done (learn Google and MS!)
 and ~120 more Tweets

Wikipedia
 Wikipedia contains semantically very rich
annotations:
 Wikipedia Categories
 Wikipedia Lists
 Times (1930, 1931, 1932, etc. etc.)
 Names
 Disambiguation pages
Etc.
 Note: DBPedia is just Wikipedia 

Wikipedia
 People have used Wikipedia edit history to
look for events

Geotags / POIs
 Many social media items carry explicit geo
information
 Geotags are low-level “coordinates”
 POIs are high-level “point-of-interest” labels
 Applications
 Recommend geo-locations to people
 Predict POI tags from (tweet) text
 Predict where a user will go next

Map text to locations
 Build a language model from all tags
assigned to flickr images that belong to a
predefined grid cell
 Neighbouring cells used for smoothing
(like hierarchic language models used
previously for video / scene / shot)
 User frequency of a term in a location
(instead of term frequency)
Neil O’Hare and Vanessa Murdock
Modeling Locations with Social Media
Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62

Placing Images: Easy
http://www.flickr.com/photos/63666148@N00/3615989115/
Athens, Ohio or Athens, Greece?

Placing Images: Hard
Ballooning company
in Ottawa

Searching the Social Graph
 Search entities, and the relationships
between them, in the (facebook) social
graph
 Clearly IR problems, but who has the data
to work with?
Micheal Curtiss et al.
Unicorn: A System for Searching the Social Graph
PVLDB, Vol. 6, No. 11

Crawling
 How to get “the” data?
 Rate limited APIs
 ToS
HEADACHES!

Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley
Is the Sample Good Enough? Comparing Data from Twitter’s Streaming
API with Twitter’s Firehose
ICWSM 2013

Not IR yet, but…
Interesting stuff nevertheless!
de Volkskrant, March 13, 2013
Michal Kosinski, David Stillwell, and Thore Graepel
Private traits and attributes are predictable from digital records of
human behavior
PNAS 2013 ; published ahead of print March 11, 2013,
doi:10.1073/pnas.1218772110

Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!

 Gather the right data for your problem
domain, and it may be a good alternative
for not having the click data we all want
so badly

 Gather the right data for your problem
domain, and it may be a good alternative
for not having the click data we all want
so badly
 Various recommendation and retrieval
tasks exist in social media – can one
theory address all of these?

ESSIR 2013 - IR and Social Media

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to ESSIR 2013 - IR and Social Media

Similar to ESSIR 2013 - IR and Social Media (20)

More from Arjen de Vries

More from Arjen de Vries (20)

Recently uploaded

Recently uploaded (20)

ESSIR 2013 - IR and Social Media