SlideShare a Scribd company logo
1 of 44
Download to read offline
Supporting Exploration and Serendipity
in Information Retrieval
Nattiya Kanhabua
Department of Computer and Information Science
Norwegian University of Science and Technology
24 February 2012
Nattiya Kanhabua 2Trial lecture
• Typical search engines
– Lookup-based paradigm
– Known-item search
Motivation
World
Wide Web
Document
Index
query
results
Does this paradigm satisfy all types of information needs?
Nattiya Kanhabua 3Trial lecture
Two tasks when searching for unknown:
1. Exploratory Search
– Users perform information seeking
• E.g., collection browsing or visualization
– Human-computer interaction
2. Serendipitous IR
– Systems predict/suggest interesting information
• E.g., recommender systems
– Asynchronous manner
Beyond the lookup-based paradigm
Nattiya Kanhabua 4Trial lecture
The next generation of search
The movie: Minority Report 2002.
PART I – EXPLORATORY SEARCH
Trial lecture Nattiya Kanhabua 5
Nattiya Kanhabua 6Trial lecture
• Information-seeking task [Marchionini 2006, White 2006a]
– Seek for unknown, or an open-end problem
– Complex information needs
– No knowledge about the contents
Exploratory search
Document
Index
query
results
? ?
Nattiya Kanhabua 7Trial lecture
Exploratory search activities
G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.
Nattiya Kanhabua 8Trial lecture
Features of exploratory search
Query (re)formulation
in real-time
Exploiting search
context
Facet-based and
metadata result filtering
Learning and
understanding support
Result visualization
Nattiya Kanhabua 9Trial lecture
• Help users to formulate information needs in an early
stage [Manning 2008]
• Query suggestion
– Support by major search engines
– Based on query logs analysis
• Query-by-example
– Search using examples of documents
Query (re)formulation
Nattiya Kanhabua 10Trial lecture
• Effective systems must adapt to contextual constraints
[Ingwersen 2005]
– Time, place, history of interaction, task in hand, etc.
• Types of context
1. Explicitly provided feedbacks
• E.g., select relevant documents
2. Implicitly obtained user information
• E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004]
Leveraging search context
Nattiya Kanhabua 11Trial lecture
Facet-based result filtering
• Facets are properties of a document [Tunkelang 2009]
– Usually obtain from metadata
• Facet search provides an ability to:
– Explore results via properties
– Expand or refine the search
Nattiya Kanhabua 12Trial lecture
Facet-based result filtering
• Facets are properties of a document [Tunkelang 2009]
– Usually obtain from metadata
• Facet search provides an ability to:
– Explore results via properties
– Expand or refine the search
• No metadata?
– Categorization
– Clustering
Nattiya Kanhabua 13Trial lecture
• Provide overviews of the collection and search results
– To understand and support an analysis
• Applications
– manyEyes [Viégas 2007]
– Stuff I’ve seen [Dumais 2003]
– TimeExplorer [Matthews 2010]
Result visualization
Nattiya Kanhabua 14Trial lecture
• Provide overviews of the collection and search results
– To understand and support an analysis
• Applications
– manyEyes [Viégas 2007]
– Stuff I’ve seen [Dumais 2003]
– TimeExplorer [Matthews 2010]
Result visualization
Nattiya Kanhabua 15Trial lecture
• Provide overviews of the collection and search results
– To understand and support an analysis
• Applications
– manyEyes [Viégas 2007]
– Stuff I’ve seen [Dumais 2003]
– TimeExplorer [Matthews 2010]
Result visualization
Nattiya Kanhabua 16Trial lecture
• Provide facilities for deriving meaning from search results
• Examples
– Wikify!: linking documents to encyclopedic knowledge
[Mihalcea 2007]
– Learning to link with Wikipedia [Milne 2008]
– Generating links to background knowledge [He 2011]
Support learning and understanding
Nattiya Kanhabua 17Trial lecture
• Evaluation metrics for exploratory search [White 2006b]
1. Engagement and enjoyment
• The degree to which users are engaged and are experiencing
2. Information novelty
• The amount of new information encountered
3. Task success
4. Task time
• Time spent to reach a state of task completeness
5. Learning and cognition
• The amount of the topics covered, or and the number of insights
users acquire
Evaluation of exploratory search
Nattiya Kanhabua 18Trial lecture
• Collaborative and social search
– Support of task division and knowledge sharing
– Allow the team to move rapidly toward task
– Provide already encountered information
Future direction
PART II – SERENDIPITOUS IR
Trial lecture Nattiya Kanhabua 19
Nattiya Kanhabua
20
Trial lecture
• Serendipity [Andel 1994]
– The act of encountering relevant information unexpectedly
• Task: Predict and suggest relevant information
– E.g., recommender systems
Serendipitous IR
20
Nattiya Kanhabua 21Trial lecture
• Motivation [Adomavicius 2005, Jannach 2010]
– Ease information overload
– Business intelligence
• Increase the number of products sold
• Sale products from the long tail
• Improve users’ experience
• Real-world applications
– Book: Amazon.com
– Movie: Netflix, IMDb
– News: Yahoo, New York Times
– Video & music: YouTube, Last.fm
Recommender systems
Nattiya Kanhabua 22Trial lecture
• Given:
– Set of items (e.g., products, movies, or news)
– User information (e.g., rating or user preference)
• Goal:
– Predict the relevance score of items
– Recommend k items based on the scores
Problem statements
Recommender
System
Item
collection
Item Score
I1 0.8
I2 0.6
I3 0.5
Non-personalized recommendation
Nattiya Kanhabua 23Trial lecture
• Given:
– Set of items (e.g., products, movies, or news)
– User information (e.g., rating or user preference)
• Goal:
– Predict the relevance score of items
– Recommend k items based on the scores
Problem statements
Recommender
System
Item
collection
Item Score
I1 0.8
I2 0.6
I3 0.5
Non-personalized recommendationPersonalized recommendation
User information
Nattiya Kanhabua 24Trial lecture
• Two main approaches
– Content-based
– Collaborative filtering
Personalized recommendation
Item Score
I1 0.8
I2 0.6
I3 0.5
Recommender
System
Item
collection
User information
Title Genre Actor …
Product features
Content-based recommendation
Nattiya Kanhabua 25Trial lecture
• Two main approaches
– Content-based
– Collaborative filtering
Personalized recommendation
Item Score
I1 0.8
I2 0.6
I3 0.5
Recommender
System
Item
collection
User information
Collaborative filtering recommendation
Community data
Nattiya Kanhabua 26Trial lecture
• Basic idea
– Give me “more like this”
– Exploit item descriptions (contents) and user preferences
• No rating data is needed
Content-based recommendation
Genre
Director, Writers, Stars
Nattiya Kanhabua 27Trial lecture
• Basic idea
– Give me “more like this”
– Exploit item descriptions (contents) and user preferences
• No rating data is needed
• Approach
1. Represent information as bag-of-word
2. Compute the similarity between the preferences and an unseen item,
e.g., the Dice coefficient or the cosine similarity [Manning 2008]
Content-based recommendation
User profiles
Contents
Title Genre Director Writer Start
The Twilight Saga:
Eclipse
Adventure,
Drama,
Fantasy
David
Slade
Melissa Rosenber,
Stephenie Meyer
Kristen Stewart,
Robert
Pattinson
Harry Potter and
the Deathly
Hallows: Part 1
Adventure,
Drama,
Fantasy
David
Yates
Steve Kloves, J.K.
Rowling
Daniel Radcliffe,
Emma Watson
Title Genre Director Writer Start
The Lord of the
Rings: The Return
of the King
Action,
Adventure,
Drama
Peter Jackson J.R.R. Tolkien,
Fran Walsh
Elijah Wood, Viggo
Mortensen
Nattiya Kanhabua 28Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends”
– Users with similar tastes tend to have also a similar taste
• Basic approach
– Use a matrix of user-item ratings (explicit or implicit)
Collaborative filtering (CF)
Nattiya Kanhabua 29Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends”
– Users with similar tastes tend to have also a similar taste
• Basic approach
– Use a matrix of user-item ratings (explicit or implicit)
Collaborative filtering (CF)
Implicit rating
- Clicks
- Page views
- Time spent on a page
Nattiya Kanhabua 30Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends”
– Users with similar tastes tend to have also a similar taste
• Basic approach
– Use a matrix of user-item ratings (explicit or implicit)
– Predict a rating for an unseen item
Collaborative filtering (CF)
Nattiya Kanhabua 31Trial lecture
• Given the active user and a matrix of user-item ratings
• Goal: predict a rating for an unseen item by
1. Find a set of users (neighbors) with similar ratings
2. Estimate John’s rating of Item5 from neighbors’ ratings
3. Repeat for all unseen items and recommend top-N items
User-based nearest-neighbor CF
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
Nattiya Kanhabua 32Trial lecture
• Measure user similarity, e.g., Pearson correlation
– a, b : users
– ra,p : rating of a for item p, , = users’ averaged ratings
– P : set of items, rated by both a and b
Find neighbors
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
sim = 0.85
sim = 0.70
sim = -0.79
Nattiya Kanhabua 33Trial lecture
• Prediction function
– Combine the rating differences
– Use the user similarity as a weight
Estimate a rating
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 4.87
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
sim = 0.85
sim = 0.70
Nattiya Kanhabua 34Trial lecture
• Basic idea
– Use the similarity between items (instead of users)
– Item-item similarity can computed offline
• Example
– Look for items that are similar to Item5, or neighbors
– Predict the rating of Item5 using John's ratings of neighbors
Item-based nearest-neighbor CF
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
Nattiya Kanhabua 35Trial lecture
• Sparse data
– Users do not rate many items
• Cold start
– No rating for new users or new items
• Scaling problem
– Millions of users and thousands of items
– m = #users and n = #items
– User-based CF
• Space complexity O(m2) when pre-computed
• Time complexity for computing Pearson O(m2n)
– Item-based CF
• Space complexity is reduced to O(n2)
Problems of CF
Nattiya Kanhabua 36Trial lecture
• How to solve the sparse data problem?
– Ask users to rate a set of items
– Use other methods in the beginning
• E.g., content-based, or non-personalized
• How to solve the scaling problem?
– Apply dimensionality reduction
• E.g. matrix factorization
Possible solutions
Nattiya Kanhabua 37Trial lecture
• Basic idea [Koren 2008]
– Determine latent factors from ratings
• E.g., types of movies (drama or action)
– Recommend items from the determined types
• Approach
– Apply dimensionality reduction
• E.g., Singular value decomposition (SVD) [Deerwester 1990]
Matrix factorization
Nattiya Kanhabua 38Trial lecture
• Basic idea
– Different approaches have their shortcomings
– Hybrid: combine different approaches
• Approach
1. Pipelined hybridization
• Use content-based to fill up entries, then use CF [Melville 2002]
Hybrid recommendation
Nattiya Kanhabua 39Trial lecture
• Basic idea
– Different approaches have their shortcomings
– Hybrid: combine different approaches
• Approach
1. Pipelined hybridization
• Use content-based to fill up entries, then use CF [Melville 2002]
2. Parallel hybridization
• Feature combination: ratings, user preferences and constraints
Hybrid recommendation
Nattiya Kanhabua 40Trial lecture
• Temporal dynamics of recommender systems
– Items has short lifetimes, i.e., dynamic set of items
– User behaviors depend on moods or time periods
– Attention to breaking news stories decay over time
– Challenge: how to capture /model temporal dynamics?
• TimeSVD++ [Koren 2009]
• Tensor factorization [Xiong 2010]
• Temporal diversity [Lathia 2010]
Future directions
Nattiya Kanhabua 41Trial lecture
• Group recommendations [McCarthy 2006]
– Recommendations for a group of users or friends
– Challenge: how to model group preference?
• Context-aware recommendations [Adomavicius 2011]
– Context, e.g., demographics, interests, time and place,
moods, weather, so on
– Challenge: how to combine different context?
Future directions (cont’)
Nattiya Kanhabua 42Trial lecture
1. Exploratory Search
– Users perform information seeking
• E.g., collection browsing or visualization
– Human-computer interaction
2. Serendipitous IR
– Systems predict/suggest interesting information
• E.g., recommender systems
– Asynchronous manner
Conclusions
Nattiya Kanhabua 43Trial lecture
• [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for
personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003.
• [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In
Proceedings of SIGIR, p. 594, 2004.
• [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The
Information Retrieval Series, Springer-Verlag, New York, 2005.
• [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a
case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011.
• [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR,
pp. 377-384, 2004.
• [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University
Press, New York, NY, USA, 2008.
• [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York
Times. In HCIR Workshop, 2010.
• [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM,
49(4), pp. 41-46, 2006.
• [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of
CIKM, pp. 233-242, 2007.
• [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008.
• [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009.
• [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at
internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007.
• [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to
special section. Communications of the ACM, 49(4), pp. 36-39, 2006
• [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating
exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006.
• [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool
Publishers, 2009.
References
Nattiya Kanhabua 44Trial lecture
• [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010.
• [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of
the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005
• [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems
Handbook, pp. 217-253, 2011.
• [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances,
patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994.
• [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of
ACM 40(3), pp. 66-72, 1997.
• [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent
Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990.
• [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge
University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering
model. In Proceedings of KDD, pp. 426-434, 2008.
• [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009.
• [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings
of SIGIR, pp. 210-217, 2010.
• [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a
critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006.
• [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved
Recommendations. In Proceedings of AAAI, pp. 187-192, 2002.
• [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with
Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010.
References (con’t)

More Related Content

Similar to Supporting Exploration and Serendipity in Information Retrieval

Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Improving Library Resource Discovery
Improving Library Resource DiscoveryImproving Library Resource Discovery
Improving Library Resource DiscoveryDanya Leebaw
 
Data collection and analysis
Data collection and analysisData collection and analysis
Data collection and analysisAndres Baravalle
 
Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013Muthusamy Chelliah
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsDenis Parra Santander
 
BUSINESS RESEARCH METHODS.pptx
BUSINESS RESEARCH METHODS.pptxBUSINESS RESEARCH METHODS.pptx
BUSINESS RESEARCH METHODS.pptxRahulNishad49
 
Open Innovation and Semantic Web
Open Innovation and Semantic WebOpen Innovation and Semantic Web
Open Innovation and Semantic WebMilan Stankovic
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextElizabeth Murnane
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Toine Bogers
 
How to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian PrimerHow to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian PrimerTao Zhang
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Qualiative Methods: Nuts and Bolts
Qualiative Methods: Nuts and BoltsQualiative Methods: Nuts and Bolts
Qualiative Methods: Nuts and BoltsRamNath63
 

Similar to Supporting Exploration and Serendipity in Information Retrieval (20)

Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Improving Library Resource Discovery
Improving Library Resource DiscoveryImproving Library Resource Discovery
Improving Library Resource Discovery
 
Data collection and analysis
Data collection and analysisData collection and analysis
Data collection and analysis
 
How to write effective case study
How to write effective case studyHow to write effective case study
How to write effective case study
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
 
Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013Ronny lempelyahooindiabigthinkerapril2013
Ronny lempelyahooindiabigthinkerapril2013
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and Lists
 
BUSINESS RESEARCH METHODS.pptx
BUSINESS RESEARCH METHODS.pptxBUSINESS RESEARCH METHODS.pptx
BUSINESS RESEARCH METHODS.pptx
 
Info521 week2
Info521 week2Info521 week2
Info521 week2
 
5.chapter 3
5.chapter 35.chapter 3
5.chapter 3
 
Research methods
Research methodsResearch methods
Research methods
 
data analysis.ppt
data analysis.pptdata analysis.ppt
data analysis.ppt
 
data analysis.pptx
data analysis.pptxdata analysis.pptx
data analysis.pptx
 
Open Innovation and Semantic Web
Open Innovation and Semantic WebOpen Innovation and Semantic Web
Open Innovation and Semantic Web
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
 
Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?Search & Recommendation: Birds of a Feather?
Search & Recommendation: Birds of a Feather?
 
How to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian PrimerHow to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian Primer
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Qualiative Methods: Nuts and Bolts
Qualiative Methods: Nuts and BoltsQualiative Methods: Nuts and Bolts
Qualiative Methods: Nuts and Bolts
 

More from Nattiya Kanhabua

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Nattiya Kanhabua
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksNattiya Kanhabua
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Nattiya Kanhabua
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationNattiya Kanhabua
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaNattiya Kanhabua
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News PredictionsNattiya Kanhabua
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updatesNattiya Kanhabua
 
Temporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveTemporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveNattiya Kanhabua
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalNattiya Kanhabua
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Nattiya Kanhabua
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Nattiya Kanhabua
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Nattiya Kanhabua
 
Searching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesSearching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesNattiya Kanhabua
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Nattiya Kanhabua
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Nattiya Kanhabua
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsNattiya Kanhabua
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalNattiya Kanhabua
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Nattiya Kanhabua
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Nattiya Kanhabua
 
Exploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesExploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesNattiya Kanhabua
 

More from Nattiya Kanhabua (20)

Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
Towards Concise Preservation by Managed Forgetting: Research Issues and Case ...
 
Understanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of OutbreaksUnderstanding the Diversity of Tweets in the Time of Outbreaks
Understanding the Diversity of Tweets in the Time of Outbreaks
 
Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?Why Is It Difficult to Detect Outbreaks in Twitter?
Why Is It Difficult to Detect Outbreaks in Twitter?
 
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result DiversificationLeveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
 
Temporal summarization of event related updates
Temporal summarization of event related updatesTemporal summarization of event related updates
Temporal summarization of event related updates
 
Temporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search PerspectiveTemporal Web Dynamics: Implications from Search Perspective
Temporal Web Dynamics: Implications from Search Perspective
 
Temporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information RetrievalTemporal Web Dynamics and Implications for Information Retrieval
Temporal Web Dynamics and Implications for Information Retrieval
 
Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?Preservation and Forgetting: Friends or Foes?
Preservation and Forgetting: Friends or Foes?
 
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
Concise Preservation by Combining Managed Forgetting and Contextualized Remem...
 
Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?Can Twitter & Co. Save Lives?
Can Twitter & Co. Save Lives?
 
Searching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current ApproachesSearching the Temporal Web: Challenges and Current Approaches
Searching the Temporal Web: Challenges and Current Approaches
 
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
Improving Temporal Language Models For Determining Time of Non-Timestamped Do...
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
 
Determining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search ResultsDetermining Time of Queries for Re-ranking Search Results
Determining Time of Queries for Re-ranking Search Results
 
Time-aware Approaches to Information Retrieval
Time-aware Approaches to Information RetrievalTime-aware Approaches to Information Retrieval
Time-aware Approaches to Information Retrieval
 
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation)
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 
Exploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document ArchivesExploiting Time-based Synonyms in Searching Document Archives
Exploiting Time-based Synonyms in Searching Document Archives
 

Recently uploaded

Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54ZhazgulNurdinova
 
The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!Loay Mohamed Ibrahim Aly
 
IPO OFFERINGS by mint hindustantimes.pdf
IPO OFFERINGS by mint hindustantimes.pdfIPO OFFERINGS by mint hindustantimes.pdf
IPO OFFERINGS by mint hindustantimes.pdfratnasehgal888
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxkb31670
 
Circle Of Life Civics Presentation Burning Issue
Circle Of Life Civics Presentation Burning IssueCircle Of Life Civics Presentation Burning Issue
Circle Of Life Civics Presentation Burning Issuebdavis22
 
Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024Gokulks007
 
Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024eCommerce Institute
 
Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxkb31670
 

Recently uploaded (10)

Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54
 
The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!
 
IPO OFFERINGS by mint hindustantimes.pdf
IPO OFFERINGS by mint hindustantimes.pdfIPO OFFERINGS by mint hindustantimes.pdf
IPO OFFERINGS by mint hindustantimes.pdf
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptx
 
Circle Of Life Civics Presentation Burning Issue
Circle Of Life Civics Presentation Burning IssueCircle Of Life Civics Presentation Burning Issue
Circle Of Life Civics Presentation Burning Issue
 
Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024
 
Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024
 
Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptx
 

Supporting Exploration and Serendipity in Information Retrieval

  • 1. Supporting Exploration and Serendipity in Information Retrieval Nattiya Kanhabua Department of Computer and Information Science Norwegian University of Science and Technology 24 February 2012
  • 2. Nattiya Kanhabua 2Trial lecture • Typical search engines – Lookup-based paradigm – Known-item search Motivation World Wide Web Document Index query results Does this paradigm satisfy all types of information needs?
  • 3. Nattiya Kanhabua 3Trial lecture Two tasks when searching for unknown: 1. Exploratory Search – Users perform information seeking • E.g., collection browsing or visualization – Human-computer interaction 2. Serendipitous IR – Systems predict/suggest interesting information • E.g., recommender systems – Asynchronous manner Beyond the lookup-based paradigm
  • 4. Nattiya Kanhabua 4Trial lecture The next generation of search The movie: Minority Report 2002.
  • 5. PART I – EXPLORATORY SEARCH Trial lecture Nattiya Kanhabua 5
  • 6. Nattiya Kanhabua 6Trial lecture • Information-seeking task [Marchionini 2006, White 2006a] – Seek for unknown, or an open-end problem – Complex information needs – No knowledge about the contents Exploratory search Document Index query results ? ?
  • 7. Nattiya Kanhabua 7Trial lecture Exploratory search activities G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.
  • 8. Nattiya Kanhabua 8Trial lecture Features of exploratory search Query (re)formulation in real-time Exploiting search context Facet-based and metadata result filtering Learning and understanding support Result visualization
  • 9. Nattiya Kanhabua 9Trial lecture • Help users to formulate information needs in an early stage [Manning 2008] • Query suggestion – Support by major search engines – Based on query logs analysis • Query-by-example – Search using examples of documents Query (re)formulation
  • 10. Nattiya Kanhabua 10Trial lecture • Effective systems must adapt to contextual constraints [Ingwersen 2005] – Time, place, history of interaction, task in hand, etc. • Types of context 1. Explicitly provided feedbacks • E.g., select relevant documents 2. Implicitly obtained user information • E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004] Leveraging search context
  • 11. Nattiya Kanhabua 11Trial lecture Facet-based result filtering • Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata • Facet search provides an ability to: – Explore results via properties – Expand or refine the search
  • 12. Nattiya Kanhabua 12Trial lecture Facet-based result filtering • Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata • Facet search provides an ability to: – Explore results via properties – Expand or refine the search • No metadata? – Categorization – Clustering
  • 13. Nattiya Kanhabua 13Trial lecture • Provide overviews of the collection and search results – To understand and support an analysis • Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010] Result visualization
  • 14. Nattiya Kanhabua 14Trial lecture • Provide overviews of the collection and search results – To understand and support an analysis • Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010] Result visualization
  • 15. Nattiya Kanhabua 15Trial lecture • Provide overviews of the collection and search results – To understand and support an analysis • Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010] Result visualization
  • 16. Nattiya Kanhabua 16Trial lecture • Provide facilities for deriving meaning from search results • Examples – Wikify!: linking documents to encyclopedic knowledge [Mihalcea 2007] – Learning to link with Wikipedia [Milne 2008] – Generating links to background knowledge [He 2011] Support learning and understanding
  • 17. Nattiya Kanhabua 17Trial lecture • Evaluation metrics for exploratory search [White 2006b] 1. Engagement and enjoyment • The degree to which users are engaged and are experiencing 2. Information novelty • The amount of new information encountered 3. Task success 4. Task time • Time spent to reach a state of task completeness 5. Learning and cognition • The amount of the topics covered, or and the number of insights users acquire Evaluation of exploratory search
  • 18. Nattiya Kanhabua 18Trial lecture • Collaborative and social search – Support of task division and knowledge sharing – Allow the team to move rapidly toward task – Provide already encountered information Future direction
  • 19. PART II – SERENDIPITOUS IR Trial lecture Nattiya Kanhabua 19
  • 20. Nattiya Kanhabua 20 Trial lecture • Serendipity [Andel 1994] – The act of encountering relevant information unexpectedly • Task: Predict and suggest relevant information – E.g., recommender systems Serendipitous IR 20
  • 21. Nattiya Kanhabua 21Trial lecture • Motivation [Adomavicius 2005, Jannach 2010] – Ease information overload – Business intelligence • Increase the number of products sold • Sale products from the long tail • Improve users’ experience • Real-world applications – Book: Amazon.com – Movie: Netflix, IMDb – News: Yahoo, New York Times – Video & music: YouTube, Last.fm Recommender systems
  • 22. Nattiya Kanhabua 22Trial lecture • Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference) • Goal: – Predict the relevance score of items – Recommend k items based on the scores Problem statements Recommender System Item collection Item Score I1 0.8 I2 0.6 I3 0.5 Non-personalized recommendation
  • 23. Nattiya Kanhabua 23Trial lecture • Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference) • Goal: – Predict the relevance score of items – Recommend k items based on the scores Problem statements Recommender System Item collection Item Score I1 0.8 I2 0.6 I3 0.5 Non-personalized recommendationPersonalized recommendation User information
  • 24. Nattiya Kanhabua 24Trial lecture • Two main approaches – Content-based – Collaborative filtering Personalized recommendation Item Score I1 0.8 I2 0.6 I3 0.5 Recommender System Item collection User information Title Genre Actor … Product features Content-based recommendation
  • 25. Nattiya Kanhabua 25Trial lecture • Two main approaches – Content-based – Collaborative filtering Personalized recommendation Item Score I1 0.8 I2 0.6 I3 0.5 Recommender System Item collection User information Collaborative filtering recommendation Community data
  • 26. Nattiya Kanhabua 26Trial lecture • Basic idea – Give me “more like this” – Exploit item descriptions (contents) and user preferences • No rating data is needed Content-based recommendation Genre Director, Writers, Stars
  • 27. Nattiya Kanhabua 27Trial lecture • Basic idea – Give me “more like this” – Exploit item descriptions (contents) and user preferences • No rating data is needed • Approach 1. Represent information as bag-of-word 2. Compute the similarity between the preferences and an unseen item, e.g., the Dice coefficient or the cosine similarity [Manning 2008] Content-based recommendation User profiles Contents Title Genre Director Writer Start The Twilight Saga: Eclipse Adventure, Drama, Fantasy David Slade Melissa Rosenber, Stephenie Meyer Kristen Stewart, Robert Pattinson Harry Potter and the Deathly Hallows: Part 1 Adventure, Drama, Fantasy David Yates Steve Kloves, J.K. Rowling Daniel Radcliffe, Emma Watson Title Genre Director Writer Start The Lord of the Rings: The Return of the King Action, Adventure, Drama Peter Jackson J.R.R. Tolkien, Fran Walsh Elijah Wood, Viggo Mortensen
  • 28. Nattiya Kanhabua 28Trial lecture • Basic idea [Balabanovic 1997] – Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste • Basic approach – Use a matrix of user-item ratings (explicit or implicit) Collaborative filtering (CF)
  • 29. Nattiya Kanhabua 29Trial lecture • Basic idea [Balabanovic 1997] – Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste • Basic approach – Use a matrix of user-item ratings (explicit or implicit) Collaborative filtering (CF) Implicit rating - Clicks - Page views - Time spent on a page
  • 30. Nattiya Kanhabua 30Trial lecture • Basic idea [Balabanovic 1997] – Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste • Basic approach – Use a matrix of user-item ratings (explicit or implicit) – Predict a rating for an unseen item Collaborative filtering (CF)
  • 31. Nattiya Kanhabua 31Trial lecture • Given the active user and a matrix of user-item ratings • Goal: predict a rating for an unseen item by 1. Find a set of users (neighbors) with similar ratings 2. Estimate John’s rating of Item5 from neighbors’ ratings 3. Repeat for all unseen items and recommend top-N items User-based nearest-neighbor CF Item1 Item2 Item3 Item4 Item5 John 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 1 5 5 2 1
  • 32. Nattiya Kanhabua 32Trial lecture • Measure user similarity, e.g., Pearson correlation – a, b : users – ra,p : rating of a for item p, , = users’ averaged ratings – P : set of items, rated by both a and b Find neighbors Item1 Item2 Item3 Item4 Item5 John 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 1 5 5 2 1 sim = 0.85 sim = 0.70 sim = -0.79
  • 33. Nattiya Kanhabua 33Trial lecture • Prediction function – Combine the rating differences – Use the user similarity as a weight Estimate a rating Item1 Item2 Item3 Item4 Item5 John 5 3 4 4 4.87 User1 3 1 2 3 3 User2 4 3 4 3 5 User3 1 5 5 2 1 sim = 0.85 sim = 0.70
  • 34. Nattiya Kanhabua 34Trial lecture • Basic idea – Use the similarity between items (instead of users) – Item-item similarity can computed offline • Example – Look for items that are similar to Item5, or neighbors – Predict the rating of Item5 using John's ratings of neighbors Item-based nearest-neighbor CF Item1 Item2 Item3 Item4 Item5 John 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 1 5 5 2 1
  • 35. Nattiya Kanhabua 35Trial lecture • Sparse data – Users do not rate many items • Cold start – No rating for new users or new items • Scaling problem – Millions of users and thousands of items – m = #users and n = #items – User-based CF • Space complexity O(m2) when pre-computed • Time complexity for computing Pearson O(m2n) – Item-based CF • Space complexity is reduced to O(n2) Problems of CF
  • 36. Nattiya Kanhabua 36Trial lecture • How to solve the sparse data problem? – Ask users to rate a set of items – Use other methods in the beginning • E.g., content-based, or non-personalized • How to solve the scaling problem? – Apply dimensionality reduction • E.g. matrix factorization Possible solutions
  • 37. Nattiya Kanhabua 37Trial lecture • Basic idea [Koren 2008] – Determine latent factors from ratings • E.g., types of movies (drama or action) – Recommend items from the determined types • Approach – Apply dimensionality reduction • E.g., Singular value decomposition (SVD) [Deerwester 1990] Matrix factorization
  • 38. Nattiya Kanhabua 38Trial lecture • Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches • Approach 1. Pipelined hybridization • Use content-based to fill up entries, then use CF [Melville 2002] Hybrid recommendation
  • 39. Nattiya Kanhabua 39Trial lecture • Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches • Approach 1. Pipelined hybridization • Use content-based to fill up entries, then use CF [Melville 2002] 2. Parallel hybridization • Feature combination: ratings, user preferences and constraints Hybrid recommendation
  • 40. Nattiya Kanhabua 40Trial lecture • Temporal dynamics of recommender systems – Items has short lifetimes, i.e., dynamic set of items – User behaviors depend on moods or time periods – Attention to breaking news stories decay over time – Challenge: how to capture /model temporal dynamics? • TimeSVD++ [Koren 2009] • Tensor factorization [Xiong 2010] • Temporal diversity [Lathia 2010] Future directions
  • 41. Nattiya Kanhabua 41Trial lecture • Group recommendations [McCarthy 2006] – Recommendations for a group of users or friends – Challenge: how to model group preference? • Context-aware recommendations [Adomavicius 2011] – Context, e.g., demographics, interests, time and place, moods, weather, so on – Challenge: how to combine different context? Future directions (cont’)
  • 42. Nattiya Kanhabua 42Trial lecture 1. Exploratory Search – Users perform information seeking • E.g., collection browsing or visualization – Human-computer interaction 2. Serendipitous IR – Systems predict/suggest interesting information • E.g., recommender systems – Asynchronous manner Conclusions
  • 43. Nattiya Kanhabua 43Trial lecture • [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003. • [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In Proceedings of SIGIR, p. 594, 2004. • [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The Information Retrieval Series, Springer-Verlag, New York, 2005. • [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011. • [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR, pp. 377-384, 2004. • [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. • [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York Times. In HCIR Workshop, 2010. • [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41-46, 2006. • [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of CIKM, pp. 233-242, 2007. • [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008. • [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. • [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007. • [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to special section. Communications of the ACM, 49(4), pp. 36-39, 2006 • [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006. • [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool Publishers, 2009. References
  • 44. Nattiya Kanhabua 44Trial lecture • [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010. • [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005 • [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems Handbook, pp. 217-253, 2011. • [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances, patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994. • [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of ACM 40(3), pp. 66-72, 1997. • [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990. • [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of KDD, pp. 426-434, 2008. • [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009. • [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings of SIGIR, pp. 210-217, 2010. • [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006. • [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved Recommendations. In Proceedings of AAAI, pp. 187-192, 2002. • [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010. References (con’t)