SlideShare a Scribd company logo
How search logs can help
improve future searches
Arjen P. de Vries
arjen@acm.org
User  Content
User  UserContent  Metadata
Content Indexing Interactions among users
Interaction with Content
User  Content
Content  Metadata
Content Indexing
Interaction with Content
(C) 2008, The New York Times Company
Anchor tekst:
“continue reading”
Not much text to
get you here...
A fan’s hyves page:
Kyteman's HipHop Orchestra:
www.kyteman.com
Ticket Sales Luxor theatre:
May 22nd - Kyteman's hiphop Orchestra - www.kyteman.com
Kluun.nl:
De site van Kyteman
Blog Rockin’ Beats:
The 21-year-old Kyteman
(trumpet player, composer and
Producer Colin Benders),
has worked for 3 years on
his debute:
the Hermit sessions.
Jazzenzo:
... a performance by the popular
Kyteman’s Hiphop Orkest
‘Co-creation’
 Social Media:
 Consumer becomes a co-creator
 ‘Data consumption’ traces
 In essence: many new sources to
play the role of anchor text!
Tweets about blip.tv
 E.g.: http://blip.tv/file/2168377
 Amazing
 Watching “World’s most realistic 3D city
models?”
 Google Earth/Maps killer
 Ludvig Emgard shows how maps/satellite pics
on web is done (learn Google and MS!)
 and ~120 more Tweets
Information Need
Representation
Result
Representation
Result
Representation
Click
Result
Representation
Click
Result
Representation
Click
Anchor
text
Weblink
Anchor
text
Weblink
Anchor
text
Weblink
AnchortextRelevancefeedback
Every search
request is
metadata!
That metadata is useful as expanded content representation, to capture more diverse
views on the same content, and reduce the vocabulary difference between creators of
content, indexers, and users, as a means to adapt retrieval systems to the user context,
and even as training data for machine learning of multimedia ‘detectors’!
Types of feedback
 Explicit user feedback
 Images/videos marked as relevant/non-relevant
 Selected keywords that are added to the query
 Selected concepts that are added to the query
 Implicit user feedback
 Clicking on retrieved images/videos (click-through
data)
 Bookmarking or sharing an image/video
 Downloading/buying an image/video
Who interact with the data?
 Interactive relevance feedback
 Current user in current search
 Personalisation
 Current user in logged past searches
 Context adaptation
 Users similar to current user in logged past
searches
 Collective knowledge
 All users in logged past searches
Applications exploiting feedback
 Given a query, rank all
images/videos based on past users
feedback
 Given an image/video, rank all
images/videos based on past users
feedback
Applications exploiting feedback
 Interactive relevance feedback
 Modify query and re-rank, based on current
user's explicit feedback (and current ranking)
 Blind relevance feedback
 Modify query and re-rank, based on feedback
by past users and current ranking
Applications exploiting feedback
 Query suggestion
 Recommend keywords/concepts to support
users in interactive query modification
(refinement or expansion)
‘Police Sting’
Sting performs with The Police
‘Elton Diana’
Sting attends Versace memorial
service
‘Led Zeppelin’
Sting performs at Led Zeppelin concert
Exploiting User Logs
(FP6 Vitalas T4.2)
 Aim
 Understand the information-searching process
of professional users of a picture portal
 Method
 Building in collaboration with Belga an
increasingly large dataset that contains the
log of Belga's users' search interactions
 Processing, analysing, and investigating the
use of this collective knowledge stored in
search logs in a variety of tasks
Search logs
 Search logs in Vitalas
 Searches performed by users through Belga's web
interface from 22/06/2007 to 12/10/2007 (101 days)
 402,388 tuples <date,time,userid,action>
 "SEARCH_PICTURES" (138,275) | "SHOW_PHOTO"
(192,168) | "DOWNLOAD_PICTURE" (38,070) |
"BROWSE_GALLERIES" (8,878) | "SHOW_GALLERY"
(24,352) | "CONNECT_IMAGE_FORUM" (645)
 17,861 unique (‘lightly normalised’) queries
 96,420 clicked images
 Web image search (Craswell and Szummer,
2007):
 Pruned graph has 1.1 million edges, 505,000 URLs and
202,000 queries
Search Logs Analysis
Clijsters
Henin
What could we learn?
 Goals
 What do users search for?
 User context
 How do professionals search image archives,
when compared to the average user?
 Query modifications
 How do users reformulate their queries within
a search session
Professionals search longer
Semantic analysis
 Most studies investigate the search
logs at the syntactic (term-based)
level
 Our idea: map the term occurrences
into linked open data (LOD)
Semantic Log Analysis
 Method:
 Map queries into linked data cloud, find 'abstract'
patterns, and re-use those for query suggestion, e.g.:
 A and B play-soccer-in-team X
 A is-spouse-of B
 Advantages:
 Reduces sparseness of the raw search log data
 Provides higher level insights in the data
 Right mix of statistics and semantics?
 Overcomes the query drift problem of thesaurus-based
query expansion
Assign Query Types
Detect High-level Relations…
… transformed into modification
patterns
Implications
 Guide the selection of
ontologies/lexicons/etc. most suited
for your user population
 Distinguish between successful and
unsuccessful queries when making
search suggestions
 Improve session boundary detection
Finally… a ‘wild idea’
 Image data is seldomly annoted
adequately
 i.e., adequately to support search
 Automatic image annotation or
‘concept detection’
 Supervised machine learning
 Requires labelled samples as training data; a
laborious and expensive task
FP6 Vitalas IP
 Phase 1 – collect training data
 Select ~500 concepten with collection owner
 Manually select ~1000 positive and negative
examples for each concept
How to obtain training data?
 Can we use click-through data
instead of manually labelled
samples?
 Advantages:
 Large quantities, no user intervention, collective
assessments
 Disadvantages:
 Noisy & sparse
 Queries not based on strict visual criteria
Automatic Image Annotation
 Research questions:
 How to annotate images with concepts using
click-through data?
 How reliable are click-through data based
annotations?
 What is the effectiveness of these annotations
as training samples for concept classifiers?
Manual annotations
annotations per concept positive samples negative samples
MEAN 1020.02 89.44 930.57
MEDIAN 998 30 970
STDEV 164.64 132.84 186.21
Manual vs. search logs based
1. How to annotate? (1/4)
 Use the queries for which images were clicked
 Challenges:
 Inherent noise: gap between queries/captions and concepts
 queries describe the content+context of images to be retrieved
 clicked images retrieved using their captions: content+context
 concept-based annotations: based on visual content-only criteria
 Sparsity: only cover part of the collection previously accessed
 Mismatch between terms in concept descriptions and queries
How to annotate (2/4)
 Basic ‘global’ method:
 Given the keywords of a query Q
 Find the query Q' in search logs that is most
textually similar to Q
 Find the images I clicked for Q'
 Find the queries Q'' for which these images
have been clicked
 Rank the queries Q'' based on the number of
images clicked for them
How to annotate (3/4)
 Exact: images clicked for queries exactly matching
the concept name
 Example: 'traffic' -> 'traffic jam', 'E40', 'vacances', 'transport‘
 Search log-based image representations:
 Images represented by all queries for which they have been
clicked
 Retrieval based on language models (smoothing, stemming)
 Example: 'traffic' -> 'infrabel', 'deutsche bahn', 'traffic lights‘
 Random walks over the click graph
 Example: 'hurricane' -> 'dean', 'mexico', 'dean haiti', 'dean
mexico'
How to annotate (4/4)
 Local method:
 given the keywords of a query Q and its top
ranked images
 Find the queries Q'' for which these images have
been clicked
 Rank the queries Q'' based on the number of
images clicked for them
•Compare agreement of click-through-based annotations to manual ones,
examining the 111 VITALAS concepts with at least 10 images (for at least
one of the methods) in the overlap of clicked and manually annotated images
• Levels of agreement vary greatly across concepts
• 20% of concepts per method reach agreement of at least 0.8
What type of concepts can be reliably
annotated using clickthrough data?
• defined categories? not informative
activities, animals, events, graphics,
people,image_theme, objects,
setting/scene/site
Possible future research on types of concepts
• named entities?
• specific vs. broad?
•
2. Reliability
Train the classifiers for each of 25
concepts
positive samples:
images selected by each method
negative samples:
selected by random sampling the 100k set
exclude those already selected as positive samples
low-level visual features FW
:
texture description
integrated Weibull distribution extracted from overlapping image
regions
low-level textual features FT
:
a vocabulary of most frequent terms in captions is built for each
concept
compare each image caption is against each concept vocabulary
build a frequency-histogram for each concept
SVM classifiers with RBF kernel (and cross
3. Effectiveness (1/3)
3. Effectiviness study (2/3)
•Experiment 1 (visual features):
–training: search-log based annotations
–test set for each concept: manual annotations (~1000 images)
–feasibility study: in most cases, AP considerably higher than the prior
3. Effectiveness (2/3)
•Experiments 2,3,4 (visual or textual features):
–Experiment 2 training: search-log based annotations
–Experiment 3 training: manual + search-log based annotations
–Experiment 4 training: manual annotations
–common test set: 56,605 images (subset of the 100,000 collection)
–contribution of search-log based annotations to training is positive
–particularly in combination with manual annotations
3. Effectiviness (3/3)
manually annotated positive samples search log based annotated positive samples
test set results
View results at:
http://olympus.ee.auth.gr/~diou/searchlogs/
Example: Soccer
Paris
or... Paris
Diversity from User Logs
 Present different query variants'
clicked images in clustered view
 Merge different query variants'
clicked images in a round robin
fashion into one list (CLEF)
ImageCLEF
'Olympics'
Olympic
games
Olympic
torch
Olympic
village
Olympic
rings
Olympic
flag
Olympic
Belgium
Olympic
stadium
Other
ImageCLEF
'Tennis'
ImageCLEF Findings
 Many queries (>20%) without
clicked images
 Corpus and available logs originated from
different time frame
 Best results combine text search in
metadata with image click data for
topic title and each of the cluster
titles
 Using query variants derived from
the logs increases recall with 50-
100%
 However, also topic drift; reduced early
precision
ImageCLEF Findings

More Related Content

What's hot

Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
Jayesh Lahori
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
YONG ZHENG
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
Chanon Hongsirikulkit
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introductionzh3f
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
Shilpa Subrahmanyam
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
YONG ZHENG
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
Tien-Yang (Aiden) Wu
 
Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...Joao Luis Tavares
 
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas HugCollaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Pôle Systematic Paris-Region
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
Anirudh Jayakumar
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
Aravindharamanan S
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?
Sherpa Software
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics supraja reddy
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
Pasquale Lops
 
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Carsten Eickhoff
 
Relevant multimedia question answering
Relevant multimedia question answeringRelevant multimedia question answering
Relevant multimedia question answeringvembuking
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Geetika Gautam
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
kelbedweihy
 

What's hot (20)

Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introduction
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...
 
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas HugCollaborative filtering for recommendation systems in Python, Nicolas Hug
Collaborative filtering for recommendation systems in Python, Nicolas Hug
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
 
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
 
Relevant multimedia question answering
Relevant multimedia question answeringRelevant multimedia question answering
Relevant multimedia question answering
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 

Viewers also liked

Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
Arjen de Vries
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
Arjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
Arjen de Vries
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
Arjen de Vries
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?!
Arjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
Arjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
Arjen de Vries
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
Arjen de Vries
 

Viewers also liked (9)

Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
How to build the next 1000 search engines?!
How to build the next 1000 search engines?! How to build the next 1000 search engines?!
How to build the next 1000 search engines?!
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 

Similar to Twente ir-course 20-10-2010

Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011idoguy
 
Attention-Streams Recommendations
Attention-Streams RecommendationsAttention-Streams Recommendations
Attention-Streams Recommendations
Gregoire Burel
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3Dave King
 
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCsQuery by Example of Speaker Audio Signals using Power Spectrum and MFCCs
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs
IJECEIAES
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
Stuart Wrigley
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptx
randominfo
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
Mounia Lalmas-Roelleke
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_Recommendation
Palani Kumar
 
Toward the next generation of recommender systems
Toward the next generation of recommender systemsToward the next generation of recommender systems
Toward the next generation of recommender systems
Aravindharamanan S
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documentsmaria.grineva
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Cataldo Musto
 
Multimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptMultimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.ppt
govintech1
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Hci
HciHci
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
maria.grineva
 
C018211723
C018211723C018211723
C018211723
IOSR Journals
 
Machine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxMachine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptx
arunchoubeybxr
 
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Charalampos Chelmis
 

Similar to Twente ir-course 20-10-2010 (20)

Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Attention-Streams Recommendations
Attention-Streams RecommendationsAttention-Streams Recommendations
Attention-Streams Recommendations
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCsQuery by Example of Speaker Audio Signals using Power Spectrum and MFCCs
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptx
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_Recommendation
 
Toward the next generation of recommender systems
Toward the next generation of recommender systemsToward the next generation of recommender systems
Toward the next generation of recommender systems
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
 
Multimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.pptMultimedia content based retrieval slideshare.ppt
Multimedia content based retrieval slideshare.ppt
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Hci
HciHci
Hci
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
 
C018211723
C018211723C018211723
C018211723
 
Machine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptxMachine_learning_presentation_on_movie_recomendation_system.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptx
 
Btp 3rd Report
Btp 3rd ReportBtp 3rd Report
Btp 3rd Report
 
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 

More from Arjen de Vries

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
Arjen de Vries
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
Arjen de Vries
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
Arjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
Arjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
Arjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
Arjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
Arjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
Arjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
Arjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Arjen de Vries
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Arjen de Vries
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
Arjen de Vries
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image Search
Arjen de Vries
 
20090914 Petamedia Irp5
20090914 Petamedia Irp520090914 Petamedia Irp5
20090914 Petamedia Irp5
Arjen de Vries
 
Diversity (in Media)
Diversity (in Media)Diversity (in Media)
Diversity (in Media)
Arjen de Vries
 

More from Arjen de Vries (19)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image Search
 
20090914 Petamedia Irp5
20090914 Petamedia Irp520090914 Petamedia Irp5
20090914 Petamedia Irp5
 
Diversity (in Media)
Diversity (in Media)Diversity (in Media)
Diversity (in Media)
 

Recently uploaded

Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

Twente ir-course 20-10-2010

  • 1. How search logs can help improve future searches Arjen P. de Vries arjen@acm.org
  • 2. User  Content User  UserContent  Metadata Content Indexing Interactions among users Interaction with Content
  • 3. User  Content Content  Metadata Content Indexing Interaction with Content
  • 4. (C) 2008, The New York Times Company Anchor tekst: “continue reading”
  • 5. Not much text to get you here... A fan’s hyves page: Kyteman's HipHop Orchestra: www.kyteman.com Ticket Sales Luxor theatre: May 22nd - Kyteman's hiphop Orchestra - www.kyteman.com Kluun.nl: De site van Kyteman Blog Rockin’ Beats: The 21-year-old Kyteman (trumpet player, composer and Producer Colin Benders), has worked for 3 years on his debute: the Hermit sessions. Jazzenzo: ... a performance by the popular Kyteman’s Hiphop Orkest
  • 6. ‘Co-creation’  Social Media:  Consumer becomes a co-creator  ‘Data consumption’ traces  In essence: many new sources to play the role of anchor text!
  • 7. Tweets about blip.tv  E.g.: http://blip.tv/file/2168377  Amazing  Watching “World’s most realistic 3D city models?”  Google Earth/Maps killer  Ludvig Emgard shows how maps/satellite pics on web is done (learn Google and MS!)  and ~120 more Tweets
  • 8.
  • 9. Information Need Representation Result Representation Result Representation Click Result Representation Click Result Representation Click Anchor text Weblink Anchor text Weblink Anchor text Weblink AnchortextRelevancefeedback Every search request is metadata! That metadata is useful as expanded content representation, to capture more diverse views on the same content, and reduce the vocabulary difference between creators of content, indexers, and users, as a means to adapt retrieval systems to the user context, and even as training data for machine learning of multimedia ‘detectors’!
  • 10. Types of feedback  Explicit user feedback  Images/videos marked as relevant/non-relevant  Selected keywords that are added to the query  Selected concepts that are added to the query  Implicit user feedback  Clicking on retrieved images/videos (click-through data)  Bookmarking or sharing an image/video  Downloading/buying an image/video
  • 11. Who interact with the data?  Interactive relevance feedback  Current user in current search  Personalisation  Current user in logged past searches  Context adaptation  Users similar to current user in logged past searches  Collective knowledge  All users in logged past searches
  • 12. Applications exploiting feedback  Given a query, rank all images/videos based on past users feedback  Given an image/video, rank all images/videos based on past users feedback
  • 13. Applications exploiting feedback  Interactive relevance feedback  Modify query and re-rank, based on current user's explicit feedback (and current ranking)  Blind relevance feedback  Modify query and re-rank, based on feedback by past users and current ranking
  • 14. Applications exploiting feedback  Query suggestion  Recommend keywords/concepts to support users in interactive query modification (refinement or expansion)
  • 15.
  • 16. ‘Police Sting’ Sting performs with The Police ‘Elton Diana’ Sting attends Versace memorial service ‘Led Zeppelin’ Sting performs at Led Zeppelin concert
  • 17. Exploiting User Logs (FP6 Vitalas T4.2)  Aim  Understand the information-searching process of professional users of a picture portal  Method  Building in collaboration with Belga an increasingly large dataset that contains the log of Belga's users' search interactions  Processing, analysing, and investigating the use of this collective knowledge stored in search logs in a variety of tasks
  • 18. Search logs  Search logs in Vitalas  Searches performed by users through Belga's web interface from 22/06/2007 to 12/10/2007 (101 days)  402,388 tuples <date,time,userid,action>  "SEARCH_PICTURES" (138,275) | "SHOW_PHOTO" (192,168) | "DOWNLOAD_PICTURE" (38,070) | "BROWSE_GALLERIES" (8,878) | "SHOW_GALLERY" (24,352) | "CONNECT_IMAGE_FORUM" (645)  17,861 unique (‘lightly normalised’) queries  96,420 clicked images  Web image search (Craswell and Szummer, 2007):  Pruned graph has 1.1 million edges, 505,000 URLs and 202,000 queries
  • 21. What could we learn?  Goals  What do users search for?  User context  How do professionals search image archives, when compared to the average user?  Query modifications  How do users reformulate their queries within a search session
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 28. Semantic analysis  Most studies investigate the search logs at the syntactic (term-based) level  Our idea: map the term occurrences into linked open data (LOD)
  • 29. Semantic Log Analysis  Method:  Map queries into linked data cloud, find 'abstract' patterns, and re-use those for query suggestion, e.g.:  A and B play-soccer-in-team X  A is-spouse-of B  Advantages:  Reduces sparseness of the raw search log data  Provides higher level insights in the data  Right mix of statistics and semantics?  Overcomes the query drift problem of thesaurus-based query expansion
  • 31.
  • 32.
  • 33.
  • 34.
  • 36. … transformed into modification patterns
  • 37.
  • 38.
  • 39. Implications  Guide the selection of ontologies/lexicons/etc. most suited for your user population  Distinguish between successful and unsuccessful queries when making search suggestions  Improve session boundary detection
  • 40. Finally… a ‘wild idea’  Image data is seldomly annoted adequately  i.e., adequately to support search  Automatic image annotation or ‘concept detection’  Supervised machine learning  Requires labelled samples as training data; a laborious and expensive task
  • 41. FP6 Vitalas IP  Phase 1 – collect training data  Select ~500 concepten with collection owner  Manually select ~1000 positive and negative examples for each concept
  • 42. How to obtain training data?  Can we use click-through data instead of manually labelled samples?  Advantages:  Large quantities, no user intervention, collective assessments  Disadvantages:  Noisy & sparse  Queries not based on strict visual criteria
  • 43. Automatic Image Annotation  Research questions:  How to annotate images with concepts using click-through data?  How reliable are click-through data based annotations?  What is the effectiveness of these annotations as training samples for concept classifiers?
  • 44. Manual annotations annotations per concept positive samples negative samples MEAN 1020.02 89.44 930.57 MEDIAN 998 30 970 STDEV 164.64 132.84 186.21
  • 45. Manual vs. search logs based
  • 46. 1. How to annotate? (1/4)  Use the queries for which images were clicked  Challenges:  Inherent noise: gap between queries/captions and concepts  queries describe the content+context of images to be retrieved  clicked images retrieved using their captions: content+context  concept-based annotations: based on visual content-only criteria  Sparsity: only cover part of the collection previously accessed  Mismatch between terms in concept descriptions and queries
  • 47. How to annotate (2/4)  Basic ‘global’ method:  Given the keywords of a query Q  Find the query Q' in search logs that is most textually similar to Q  Find the images I clicked for Q'  Find the queries Q'' for which these images have been clicked  Rank the queries Q'' based on the number of images clicked for them
  • 48. How to annotate (3/4)  Exact: images clicked for queries exactly matching the concept name  Example: 'traffic' -> 'traffic jam', 'E40', 'vacances', 'transport‘  Search log-based image representations:  Images represented by all queries for which they have been clicked  Retrieval based on language models (smoothing, stemming)  Example: 'traffic' -> 'infrabel', 'deutsche bahn', 'traffic lights‘  Random walks over the click graph  Example: 'hurricane' -> 'dean', 'mexico', 'dean haiti', 'dean mexico'
  • 49. How to annotate (4/4)  Local method:  given the keywords of a query Q and its top ranked images  Find the queries Q'' for which these images have been clicked  Rank the queries Q'' based on the number of images clicked for them
  • 50. •Compare agreement of click-through-based annotations to manual ones, examining the 111 VITALAS concepts with at least 10 images (for at least one of the methods) in the overlap of clicked and manually annotated images • Levels of agreement vary greatly across concepts • 20% of concepts per method reach agreement of at least 0.8 What type of concepts can be reliably annotated using clickthrough data? • defined categories? not informative activities, animals, events, graphics, people,image_theme, objects, setting/scene/site Possible future research on types of concepts • named entities? • specific vs. broad? • 2. Reliability
  • 51. Train the classifiers for each of 25 concepts positive samples: images selected by each method negative samples: selected by random sampling the 100k set exclude those already selected as positive samples low-level visual features FW : texture description integrated Weibull distribution extracted from overlapping image regions low-level textual features FT : a vocabulary of most frequent terms in captions is built for each concept compare each image caption is against each concept vocabulary build a frequency-histogram for each concept SVM classifiers with RBF kernel (and cross 3. Effectiveness (1/3)
  • 52. 3. Effectiviness study (2/3) •Experiment 1 (visual features): –training: search-log based annotations –test set for each concept: manual annotations (~1000 images) –feasibility study: in most cases, AP considerably higher than the prior 3. Effectiveness (2/3)
  • 53. •Experiments 2,3,4 (visual or textual features): –Experiment 2 training: search-log based annotations –Experiment 3 training: manual + search-log based annotations –Experiment 4 training: manual annotations –common test set: 56,605 images (subset of the 100,000 collection) –contribution of search-log based annotations to training is positive –particularly in combination with manual annotations 3. Effectiviness (3/3)
  • 54. manually annotated positive samples search log based annotated positive samples test set results View results at: http://olympus.ee.auth.gr/~diou/searchlogs/ Example: Soccer
  • 55. Paris
  • 57. Diversity from User Logs  Present different query variants' clicked images in clustered view  Merge different query variants' clicked images in a round robin fashion into one list (CLEF)
  • 59.
  • 60.
  • 61.
  • 63.
  • 64.
  • 65.
  • 66. ImageCLEF Findings  Many queries (>20%) without clicked images  Corpus and available logs originated from different time frame
  • 67.  Best results combine text search in metadata with image click data for topic title and each of the cluster titles  Using query variants derived from the logs increases recall with 50- 100%  However, also topic drift; reduced early precision ImageCLEF Findings

Editor's Notes

  1. &amp;lt;number&amp;gt;
  2. &amp;lt;number&amp;gt;
  3. &amp;lt;number&amp;gt;
  4. &amp;lt;number&amp;gt;
  5. &amp;lt;number&amp;gt;
  6. &amp;lt;number&amp;gt;
  7. &amp;lt;number&amp;gt; Explicit/implicit refers to whether a user&amp;apos;s action translates into explicit/implicit evidence of relevance. Explicit evidence of relevance is when the user says a document is relevant. Implicit evidence of relevance is when the user may not say that something is relevant, but his actions/behaviour (i.e, clicking, looking at, etc.) indicate that to some degree the user finds it relevant. When a Belga user buys an image, he may not have said explicitly that it is relevant, but this action is very close to that. Belga&amp;apos;s logs currently include only type implicit evidence, whereas VITALAS logs will include both implicit and explicit. Maybe “personalisation” should also become context adaptation so as to be consistent with IP3 and not get confused with the personalisation in WP5.
  8. &amp;lt;number&amp;gt; Explicit/implicit refers to whether a user&amp;apos;s action translates into explicit/implicit evidence of relevance. Explicit evidence of relevance is when the user says a document is relevant. Implicit evidence of relevance is when the user may not say that something is relevant, but his actions/behaviour (i.e, clicking, looking at, etc.) indicate that to some degree the user finds it relevant. When a Belga user buys an image, he may not have said explicitly that it is relevant, but this action is very close to that. Belga&amp;apos;s logs currently include only type implicit evidence, whereas VITALAS logs will include both implicit and explicit. Maybe “personalisation” should also become context adaptation so as to be consistent with IP3 and not get confused with the personalisation in WP5.
  9. &amp;lt;number&amp;gt;
  10. &amp;lt;number&amp;gt;
  11. &amp;lt;number&amp;gt;
  12. &amp;lt;number&amp;gt;
  13. &amp;lt;number&amp;gt;
  14. &amp;lt;number&amp;gt;
  15. &amp;lt;number&amp;gt;
  16. &amp;lt;number&amp;gt;
  17. &amp;lt;number&amp;gt;
  18. &amp;lt;number&amp;gt;
  19. &amp;lt;number&amp;gt;
  20. &amp;lt;number&amp;gt;
  21. &amp;lt;number&amp;gt;
  22. &amp;lt;number&amp;gt;
  23. &amp;lt;number&amp;gt;
  24. &amp;lt;number&amp;gt;
  25. &amp;lt;number&amp;gt;
  26. &amp;lt;number&amp;gt;
  27. &amp;lt;number&amp;gt;
  28. &amp;lt;number&amp;gt;
  29. &amp;lt;number&amp;gt;
  30. &amp;lt;number&amp;gt;
  31. &amp;lt;number&amp;gt;
  32. &amp;lt;number&amp;gt;
  33. &amp;lt;number&amp;gt;