SlideShare a Scribd company logo
1 of 80
9th European Summer School in Information Retrieval September 4th, 2013
http://bit.ly/ESSIR13IRSocMedia
IR and Social Media
Arjen P. de Vries
arjen@acm.org
Centrum Wiskunde & Informatica
Delft University of Technology
Spinque B.V.
On slideshare,
IR = Investor Relations
Social Media
Noun
social media (plural only)
Interactive forms of media that allow users
to interact with and publish to each other,
generally by means of the Internet.
The early 21st century saw a huge increase in social
media thanks to the widespread availability of the
Internet.
http://www.webanalyticsworld.net/2010/11/history-of-social-media-infographic.html
Social Media
 “Social bookmarking” sites
 “User generated content”
 Images (flickr) and videos (youtube, vimeo), but also
blogs
 Social network services
 Twitter, facebook
Not just one beast!
IR and Social Media?
Red Hot Chili Peppers
“Rock group” in
author’s metadata...
Organisation in
groups may help
disambiguate
query!
More implicit
metadata...
Information Science
“Search for the fundamental knowledge
which will allow us to postulate and utilize
the most efficient combination of [human
and machine] resources”
 M.E. Senko. Information systems: records, relations, sets, entities,
and things. Information systems, 1(1):3–13, 1975.
Core Questions
 How to represent information?
 The information need and search requests
 The objects to be shown in response to an
information request
 How to match information
representations?
IR and Social Media
 Richer information representations!
Richer representations
 User profiles
 User name, full name, description, image,
homepage url, etc.
 Connections between users
 Networks of friends, followers, etc
 Comments/reactions
 Endorsing and sharing
Q: Web ancient social media?
(C) 2008, The New York Times Company
Anchor tekst:
“continue reading”
Not a lot of info
to represent
the page…
Een fan’s hyves page:
Kyteman's HipHop Orchestra: www.kyteman.com
Kaartverkoop luxor theater:
22 mei - Kyteman's hiphop Orkest - www.kyteman.com
Kluun.nl:
De site van Kyteman
Blog Rockin’ Beats:
De 21-jarige Kyteman
(trompettist, componist en
Producer Colin Benders),
heeft drie jaar gewerkt aan
zijn debuut:
the Hermit sessions.
Jazzenzo:
...een optreden van het populaire
Kyteman’s Hiphop Orkest
‘Co-creation’
 Social Media:
 Consumer becomes a co-creator
 ‘Data consumption’ traces
 In essence: many new sources to play the
role of anchor text
 Tags and/or ratings
 Tweets
 Comments, reviews
Potential Benefits for IR
 Expand content representation
 Reduce the vocabulary gap(s) between
creators of content, indexers, and users
 More diverse views on the same content
Potential Benefits for IR
 Relevance depends on user context
 User task
 User knowledge
Potential Benefits for IR
 Relevance depends on user context
 User task
 User knowledge
 Social media provide an opportunity to
make much better assumptions about
user context
 A specific user’s context
 The variety of user contexts that may exist
Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders.
The task dependent effect of tags and ratings on social media access.
TOIS 28, 4, article 21 (November 2010), 42 pages.
LibraryThing
LibraryThing
 Items
 People
 Tags
 Ratings
See also: http://www.macle.nl/tud/LT/
Synonyms
Synonyms
Examples
 Humour
 Classic
LibraryThing
 Items
 People
 Tags
 Ratings
See also: http://www.macle.nl/tud/LT/
Search with Random Walk
 Present nodes according to estimated
probability that a random walk that starts
from (task dependent) starting nodes,
would end at this node
 E.g., tag suggestion starts in a tag node;
personalized search in tag and user nodes
Tagging Relationships
An item recommendation walk
Ratings
 Ratings may enhance the graph, or just
be used for evaluation
Personalized Search
 Assume a user who types a single tag as
query
Personalized Search
 A soft clustering effect smoothly relates
similar concepts before converging to the
background probability
 Homographs like “Java” are
disambiguated because the walk starts in
both the query tag and the target user
 So, content that matches the user’s
preference is more likely to be found first
Common System Designs
Analysis results
 Allowing all users to tag all available
content improves retrieval tasks
 Combining tags and ratings may improve
both search and recommendation tasks
Ternary relation lost!
 The UIT matrix represents a ternary
relation, that is lost when creating the
three UI, IT and UT matrices
Ternary relation lost!
 The UIT matrix represents a ternary
relation, that is lost when creating the
three UI, IT and UT matrices
 Potentially a problem if tags express opinion
about an item; e.g.,
 “poetry” can independent from item still describe
the user
 “awful” requires to know what item the term
belongs to
Tags vs. rating
 Most tags do not deviate far from the
mean rating
 Only few tags strongly correlated with
opinion
 Note: poetry higher quality than chicklit
Metadata
 Scientific articles have many types of
metadata associated:
 Abstract
 Author
 Booktitle
 Description
 Journal
 Tags
 Are all these types of metadata useful for
item recommendation?
Metadata
 According to Toine Bogers’ PhD thesis:
 Concatenate all fields associated to a single
user’s profile’s items into one huge text field,
and use an off-the-shelf IR model to match
the profile against metadata of the items.
“Profile-centric Matching”
 Or, construct item profiles from meta-data of
all users for that item, and apply an item-
based collaborative filtering approach
“Item-based Hybrid Filtering”
 Author, description, tags, title, url, journal
and booktitle all contribute
Finally: a recent case study
Artist Popularity?
 Let’s ask widely used social media music
platforms!
 I.e., query their APIs
Artist Popularity (1-3)
 Top-5 popular artists in dataset
 Jan 21 – Mar 21
 3 hourly timestamped popularity indices
http://bit.ly/ESSIR13IRSocMedia
Artist Popularity
Artist Popularity (?!)
 Top-5 popular artists in dataset
 Jan 21 – Mar 21
 3 hourly timestamped popularity indices
The Black Keys
The Black Keys
 Three grammy awards received!
The Black Keys
 Web responds, while service based
popularity index is static
Implications
 An “artist popularity” index depends on
the platform and its user population
 Web based popularity – estimated via URL
shortener’s API – “reacts” to real-world
events
 Suitable as an academics’ search log
replacement?
Implications
 An “artist popularity” index depends on
the platform and its user population
 Web based popularity – estimated via URL
shortener’s API – “reacts” to real-world
events
 Suitable as an academics’ search log
replacement?
 Q: What is the most useful popularity –
one that changes dynamically or one that
lasts?
Many topics I skipped…
Tweets about blip.tv
 “Twanchor text”
 E.g.: http://blip.tv/file/2168377
 Amazing
 Watching “World’s most realistic 3D city
models?”
 Google Earth/Maps killer
 Ludvig Emgard shows how maps/satellite pics
on web is done (learn Google and MS!)
 and ~120 more Tweets
Wikipedia
 Wikipedia contains semantically very rich
annotations:
 Wikipedia Categories
 Wikipedia Lists
 Times (1930, 1931, 1932, etc. etc.)
 Names
 Disambiguation pages
Etc.
 Note: DBPedia is just Wikipedia 
Wikipedia
 People have used Wikipedia edit history to
look for events
Geotags / POIs
 Many social media items carry explicit geo
information
 Geotags are low-level “coordinates”
 POIs are high-level “point-of-interest” labels
 Applications
 Recommend geo-locations to people
 Predict POI tags from (tweet) text
 Predict where a user will go next
Map text to locations
 Build a language model from all tags
assigned to flickr images that belong to a
predefined grid cell
 Neighbouring cells used for smoothing
(like hierarchic language models used
previously for video / scene / shot)
 User frequency of a term in a location
(instead of term frequency)
Neil O’Hare and Vanessa Murdock
Modeling Locations with Social Media
Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62
Placing Images: Easy
http://www.flickr.com/photos/63666148@N00/3615989115/
Athens, Ohio or Athens, Greece?
Placing Images: Hard
Ballooning company
in Ottawa
Searching the Social Graph
 Search entities, and the relationships
between them, in the (facebook) social
graph
 Clearly IR problems, but who has the data
to work with?
Micheal Curtiss et al.
Unicorn: A System for Searching the Social Graph
PVLDB, Vol. 6, No. 11
Crawling
 How to get “the” data?
 Rate limited APIs
 ToS
HEADACHES!
Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley
Is the Sample Good Enough? Comparing Data from Twitter’s Streaming
API with Twitter’s Firehose
ICWSM 2013
Not IR yet, but…
Interesting stuff nevertheless!
de Volkskrant, March 13, 2013
Michal Kosinski, David Stillwell, and Thore Graepel
Private traits and attributes are predictable from digital records of
human behavior
PNAS 2013 ; published ahead of print March 11, 2013,
doi:10.1073/pnas.1218772110
Take home message(s)
Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!
Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!
 Gather the right data for your problem
domain, and it may be a good alternative
for not having the click data we all want
so badly
Take home message(s)
 Social media give us IR researchers
access to a rich resource of context
 Including time & location!
 Gather the right data for your problem
domain, and it may be a good alternative
for not having the click data we all want
so badly
 Various recommendation and retrieval
tasks exist in social media – can one
theory address all of these?
C U @ #ECIR2014 ? !

More Related Content

What's hot

Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
 
Mining social data
Mining social dataMining social data
Mining social dataMalk Zameth
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis Jari Jussila
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Bradley Allen
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Mediahome
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social MediaArjen de Vries
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 
Future of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesFuture of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesSimon Buckingham Shum
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_servicessiyaza
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An IntroductionAli Abbasi
 

What's hot (19)

Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Mining social data
Mining social dataMining social data
Mining social data
 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Social Media Mining and Analytics
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
 
About the Social Semantic Web
About the Social Semantic WebAbout the Social Semantic Web
About the Social Semantic Web
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Future of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologiesFuture of Journalism - civil discourse technologies
Future of Journalism - civil discourse technologies
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Semantic Web - Introduction
Semantic Web - IntroductionSemantic Web - Introduction
Semantic Web - Introduction
 
SDoW2010 keynote
SDoW2010 keynoteSDoW2010 keynote
SDoW2010 keynote
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 

Similar to ESSIR 2013 - IR and Social Media

The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Adventures in Cat Herding
Adventures in Cat HerdingAdventures in Cat Herding
Adventures in Cat HerdingLarry Belmont
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolFarida Vis
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social ProcessRobert Cormia
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Thomas Ryberg
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009Salim Ismail
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Online information 2010_track_two_final_corrected
Online information 2010_track_two_final_correctedOnline information 2010_track_two_final_corrected
Online information 2010_track_two_final_correctedBasset Hervé
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART projectJulie Allinson
 

Similar to ESSIR 2013 - IR and Social Media (20)

The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Adventures in Cat Herding
Adventures in Cat HerdingAdventures in Cat Herding
Adventures in Cat Herding
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter School
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social Process
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
DMI Summer 2010 - Final Presentations
DMI Summer 2010 - Final PresentationsDMI Summer 2010 - Final Presentations
DMI Summer 2010 - Final Presentations
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009
 
Proposal.docx
Proposal.docxProposal.docx
Proposal.docx
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
020610
020610020610
020610
 
Osw Digital Humanities
Osw Digital HumanitiesOsw Digital Humanities
Osw Digital Humanities
 
Online information 2010_track_two_final_corrected
Online information 2010_track_two_final_correctedOnline information 2010_track_two_final_corrected
Online information 2010_track_two_final_corrected
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
 

More from Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engineArjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by StrategyArjen de Vries
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?Arjen de Vries
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?! Arjen de Vries
 

More from Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?!
 

Recently uploaded

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 

Recently uploaded (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 

ESSIR 2013 - IR and Social Media

  • 1. 9th European Summer School in Information Retrieval September 4th, 2013 http://bit.ly/ESSIR13IRSocMedia IR and Social Media Arjen P. de Vries arjen@acm.org Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.
  • 2. On slideshare, IR = Investor Relations
  • 3. Social Media Noun social media (plural only) Interactive forms of media that allow users to interact with and publish to each other, generally by means of the Internet. The early 21st century saw a huge increase in social media thanks to the widespread availability of the Internet.
  • 5. Social Media  “Social bookmarking” sites  “User generated content”  Images (flickr) and videos (youtube, vimeo), but also blogs  Social network services  Twitter, facebook
  • 6. Not just one beast!
  • 7.
  • 8.
  • 9. IR and Social Media?
  • 10. Red Hot Chili Peppers
  • 11. “Rock group” in author’s metadata... Organisation in groups may help disambiguate query! More implicit metadata...
  • 12. Information Science “Search for the fundamental knowledge which will allow us to postulate and utilize the most efficient combination of [human and machine] resources”  M.E. Senko. Information systems: records, relations, sets, entities, and things. Information systems, 1(1):3–13, 1975.
  • 13. Core Questions  How to represent information?  The information need and search requests  The objects to be shown in response to an information request  How to match information representations?
  • 14. IR and Social Media  Richer information representations!
  • 15. Richer representations  User profiles  User name, full name, description, image, homepage url, etc.  Connections between users  Networks of friends, followers, etc  Comments/reactions  Endorsing and sharing
  • 16. Q: Web ancient social media?
  • 17. (C) 2008, The New York Times Company Anchor tekst: “continue reading”
  • 18. Not a lot of info to represent the page… Een fan’s hyves page: Kyteman's HipHop Orchestra: www.kyteman.com Kaartverkoop luxor theater: 22 mei - Kyteman's hiphop Orkest - www.kyteman.com Kluun.nl: De site van Kyteman Blog Rockin’ Beats: De 21-jarige Kyteman (trompettist, componist en Producer Colin Benders), heeft drie jaar gewerkt aan zijn debuut: the Hermit sessions. Jazzenzo: ...een optreden van het populaire Kyteman’s Hiphop Orkest
  • 19.
  • 20. ‘Co-creation’  Social Media:  Consumer becomes a co-creator  ‘Data consumption’ traces  In essence: many new sources to play the role of anchor text  Tags and/or ratings  Tweets  Comments, reviews
  • 21. Potential Benefits for IR  Expand content representation  Reduce the vocabulary gap(s) between creators of content, indexers, and users  More diverse views on the same content
  • 22. Potential Benefits for IR  Relevance depends on user context  User task  User knowledge
  • 23. Potential Benefits for IR  Relevance depends on user context  User task  User knowledge  Social media provide an opportunity to make much better assumptions about user context  A specific user’s context  The variety of user contexts that may exist
  • 24. Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders. The task dependent effect of tags and ratings on social media access. TOIS 28, 4, article 21 (November 2010), 42 pages.
  • 26. LibraryThing  Items  People  Tags  Ratings See also: http://www.macle.nl/tud/LT/
  • 29.
  • 31. LibraryThing  Items  People  Tags  Ratings See also: http://www.macle.nl/tud/LT/
  • 32.
  • 33. Search with Random Walk  Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node  E.g., tag suggestion starts in a tag node; personalized search in tag and user nodes
  • 35.
  • 37. Ratings  Ratings may enhance the graph, or just be used for evaluation
  • 38. Personalized Search  Assume a user who types a single tag as query
  • 40.  A soft clustering effect smoothly relates similar concepts before converging to the background probability
  • 41.  Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user  So, content that matches the user’s preference is more likely to be found first
  • 43. Analysis results  Allowing all users to tag all available content improves retrieval tasks  Combining tags and ratings may improve both search and recommendation tasks
  • 44. Ternary relation lost!  The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices
  • 45. Ternary relation lost!  The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices  Potentially a problem if tags express opinion about an item; e.g.,  “poetry” can independent from item still describe the user  “awful” requires to know what item the term belongs to
  • 46.
  • 47. Tags vs. rating  Most tags do not deviate far from the mean rating  Only few tags strongly correlated with opinion  Note: poetry higher quality than chicklit
  • 48. Metadata  Scientific articles have many types of metadata associated:  Abstract  Author  Booktitle  Description  Journal  Tags  Are all these types of metadata useful for item recommendation?
  • 49. Metadata  According to Toine Bogers’ PhD thesis:  Concatenate all fields associated to a single user’s profile’s items into one huge text field, and use an off-the-shelf IR model to match the profile against metadata of the items. “Profile-centric Matching”  Or, construct item profiles from meta-data of all users for that item, and apply an item- based collaborative filtering approach “Item-based Hybrid Filtering”  Author, description, tags, title, url, journal and booktitle all contribute
  • 50. Finally: a recent case study
  • 51. Artist Popularity?  Let’s ask widely used social media music platforms!  I.e., query their APIs
  • 52.
  • 53. Artist Popularity (1-3)  Top-5 popular artists in dataset  Jan 21 – Mar 21  3 hourly timestamped popularity indices
  • 56. Artist Popularity (?!)  Top-5 popular artists in dataset  Jan 21 – Mar 21  3 hourly timestamped popularity indices
  • 58. The Black Keys  Three grammy awards received!
  • 59. The Black Keys  Web responds, while service based popularity index is static
  • 60. Implications  An “artist popularity” index depends on the platform and its user population  Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events  Suitable as an academics’ search log replacement?
  • 61. Implications  An “artist popularity” index depends on the platform and its user population  Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events  Suitable as an academics’ search log replacement?  Q: What is the most useful popularity – one that changes dynamically or one that lasts?
  • 62.
  • 63. Many topics I skipped…
  • 64.
  • 65. Tweets about blip.tv  “Twanchor text”  E.g.: http://blip.tv/file/2168377  Amazing  Watching “World’s most realistic 3D city models?”  Google Earth/Maps killer  Ludvig Emgard shows how maps/satellite pics on web is done (learn Google and MS!)  and ~120 more Tweets
  • 66. Wikipedia  Wikipedia contains semantically very rich annotations:  Wikipedia Categories  Wikipedia Lists  Times (1930, 1931, 1932, etc. etc.)  Names  Disambiguation pages Etc.  Note: DBPedia is just Wikipedia 
  • 67. Wikipedia  People have used Wikipedia edit history to look for events
  • 68. Geotags / POIs  Many social media items carry explicit geo information  Geotags are low-level “coordinates”  POIs are high-level “point-of-interest” labels  Applications  Recommend geo-locations to people  Predict POI tags from (tweet) text  Predict where a user will go next
  • 69. Map text to locations  Build a language model from all tags assigned to flickr images that belong to a predefined grid cell  Neighbouring cells used for smoothing (like hierarchic language models used previously for video / scene / shot)  User frequency of a term in a location (instead of term frequency) Neil O’Hare and Vanessa Murdock Modeling Locations with Social Media Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62
  • 71. Placing Images: Hard Ballooning company in Ottawa
  • 72. Searching the Social Graph  Search entities, and the relationships between them, in the (facebook) social graph  Clearly IR problems, but who has the data to work with? Micheal Curtiss et al. Unicorn: A System for Searching the Social Graph PVLDB, Vol. 6, No. 11
  • 73. Crawling  How to get “the” data?  Rate limited APIs  ToS HEADACHES!
  • 74. Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose ICWSM 2013
  • 75. Not IR yet, but… Interesting stuff nevertheless! de Volkskrant, March 13, 2013 Michal Kosinski, David Stillwell, and Thore Graepel Private traits and attributes are predictable from digital records of human behavior PNAS 2013 ; published ahead of print March 11, 2013, doi:10.1073/pnas.1218772110
  • 77. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!
  • 78. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!  Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly
  • 79. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!  Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly  Various recommendation and retrieval tasks exist in social media – can one theory address all of these?
  • 80. C U @ #ECIR2014 ? !