Communication Accommodation Theory Kaylyn Benton.pptx
Supporting Exploration and Serendipity in Information Retrieval
1. Supporting Exploration and Serendipity
in Information Retrieval
Nattiya Kanhabua
Department of Computer and Information Science
Norwegian University of Science and Technology
24 February 2012
2. Nattiya Kanhabua 2Trial lecture
• Typical search engines
– Lookup-based paradigm
– Known-item search
Motivation
World
Wide Web
Document
Index
query
results
Does this paradigm satisfy all types of information needs?
3. Nattiya Kanhabua 3Trial lecture
Two tasks when searching for unknown:
1. Exploratory Search
– Users perform information seeking
• E.g., collection browsing or visualization
– Human-computer interaction
2. Serendipitous IR
– Systems predict/suggest interesting information
• E.g., recommender systems
– Asynchronous manner
Beyond the lookup-based paradigm
4. Nattiya Kanhabua 4Trial lecture
The next generation of search
The movie: Minority Report 2002.
5. PART I – EXPLORATORY SEARCH
Trial lecture Nattiya Kanhabua 5
6. Nattiya Kanhabua 6Trial lecture
• Information-seeking task [Marchionini 2006, White 2006a]
– Seek for unknown, or an open-end problem
– Complex information needs
– No knowledge about the contents
Exploratory search
Document
Index
query
results
? ?
7. Nattiya Kanhabua 7Trial lecture
Exploratory search activities
G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.
8. Nattiya Kanhabua 8Trial lecture
Features of exploratory search
Query (re)formulation
in real-time
Exploiting search
context
Facet-based and
metadata result filtering
Learning and
understanding support
Result visualization
9. Nattiya Kanhabua 9Trial lecture
• Help users to formulate information needs in an early
stage [Manning 2008]
• Query suggestion
– Support by major search engines
– Based on query logs analysis
• Query-by-example
– Search using examples of documents
Query (re)formulation
10. Nattiya Kanhabua 10Trial lecture
• Effective systems must adapt to contextual constraints
[Ingwersen 2005]
– Time, place, history of interaction, task in hand, etc.
• Types of context
1. Explicitly provided feedbacks
• E.g., select relevant documents
2. Implicitly obtained user information
• E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004]
Leveraging search context
11. Nattiya Kanhabua 11Trial lecture
Facet-based result filtering
• Facets are properties of a document [Tunkelang 2009]
– Usually obtain from metadata
• Facet search provides an ability to:
– Explore results via properties
– Expand or refine the search
12. Nattiya Kanhabua 12Trial lecture
Facet-based result filtering
• Facets are properties of a document [Tunkelang 2009]
– Usually obtain from metadata
• Facet search provides an ability to:
– Explore results via properties
– Expand or refine the search
• No metadata?
– Categorization
– Clustering
13. Nattiya Kanhabua 13Trial lecture
• Provide overviews of the collection and search results
– To understand and support an analysis
• Applications
– manyEyes [Viégas 2007]
– Stuff I’ve seen [Dumais 2003]
– TimeExplorer [Matthews 2010]
Result visualization
14. Nattiya Kanhabua 14Trial lecture
• Provide overviews of the collection and search results
– To understand and support an analysis
• Applications
– manyEyes [Viégas 2007]
– Stuff I’ve seen [Dumais 2003]
– TimeExplorer [Matthews 2010]
Result visualization
15. Nattiya Kanhabua 15Trial lecture
• Provide overviews of the collection and search results
– To understand and support an analysis
• Applications
– manyEyes [Viégas 2007]
– Stuff I’ve seen [Dumais 2003]
– TimeExplorer [Matthews 2010]
Result visualization
16. Nattiya Kanhabua 16Trial lecture
• Provide facilities for deriving meaning from search results
• Examples
– Wikify!: linking documents to encyclopedic knowledge
[Mihalcea 2007]
– Learning to link with Wikipedia [Milne 2008]
– Generating links to background knowledge [He 2011]
Support learning and understanding
17. Nattiya Kanhabua 17Trial lecture
• Evaluation metrics for exploratory search [White 2006b]
1. Engagement and enjoyment
• The degree to which users are engaged and are experiencing
2. Information novelty
• The amount of new information encountered
3. Task success
4. Task time
• Time spent to reach a state of task completeness
5. Learning and cognition
• The amount of the topics covered, or and the number of insights
users acquire
Evaluation of exploratory search
18. Nattiya Kanhabua 18Trial lecture
• Collaborative and social search
– Support of task division and knowledge sharing
– Allow the team to move rapidly toward task
– Provide already encountered information
Future direction
19. PART II – SERENDIPITOUS IR
Trial lecture Nattiya Kanhabua 19
20. Nattiya Kanhabua
20
Trial lecture
• Serendipity [Andel 1994]
– The act of encountering relevant information unexpectedly
• Task: Predict and suggest relevant information
– E.g., recommender systems
Serendipitous IR
20
21. Nattiya Kanhabua 21Trial lecture
• Motivation [Adomavicius 2005, Jannach 2010]
– Ease information overload
– Business intelligence
• Increase the number of products sold
• Sale products from the long tail
• Improve users’ experience
• Real-world applications
– Book: Amazon.com
– Movie: Netflix, IMDb
– News: Yahoo, New York Times
– Video & music: YouTube, Last.fm
Recommender systems
22. Nattiya Kanhabua 22Trial lecture
• Given:
– Set of items (e.g., products, movies, or news)
– User information (e.g., rating or user preference)
• Goal:
– Predict the relevance score of items
– Recommend k items based on the scores
Problem statements
Recommender
System
Item
collection
Item Score
I1 0.8
I2 0.6
I3 0.5
Non-personalized recommendation
23. Nattiya Kanhabua 23Trial lecture
• Given:
– Set of items (e.g., products, movies, or news)
– User information (e.g., rating or user preference)
• Goal:
– Predict the relevance score of items
– Recommend k items based on the scores
Problem statements
Recommender
System
Item
collection
Item Score
I1 0.8
I2 0.6
I3 0.5
Non-personalized recommendationPersonalized recommendation
User information
24. Nattiya Kanhabua 24Trial lecture
• Two main approaches
– Content-based
– Collaborative filtering
Personalized recommendation
Item Score
I1 0.8
I2 0.6
I3 0.5
Recommender
System
Item
collection
User information
Title Genre Actor …
Product features
Content-based recommendation
25. Nattiya Kanhabua 25Trial lecture
• Two main approaches
– Content-based
– Collaborative filtering
Personalized recommendation
Item Score
I1 0.8
I2 0.6
I3 0.5
Recommender
System
Item
collection
User information
Collaborative filtering recommendation
Community data
26. Nattiya Kanhabua 26Trial lecture
• Basic idea
– Give me “more like this”
– Exploit item descriptions (contents) and user preferences
• No rating data is needed
Content-based recommendation
Genre
Director, Writers, Stars
27. Nattiya Kanhabua 27Trial lecture
• Basic idea
– Give me “more like this”
– Exploit item descriptions (contents) and user preferences
• No rating data is needed
• Approach
1. Represent information as bag-of-word
2. Compute the similarity between the preferences and an unseen item,
e.g., the Dice coefficient or the cosine similarity [Manning 2008]
Content-based recommendation
User profiles
Contents
Title Genre Director Writer Start
The Twilight Saga:
Eclipse
Adventure,
Drama,
Fantasy
David
Slade
Melissa Rosenber,
Stephenie Meyer
Kristen Stewart,
Robert
Pattinson
Harry Potter and
the Deathly
Hallows: Part 1
Adventure,
Drama,
Fantasy
David
Yates
Steve Kloves, J.K.
Rowling
Daniel Radcliffe,
Emma Watson
Title Genre Director Writer Start
The Lord of the
Rings: The Return
of the King
Action,
Adventure,
Drama
Peter Jackson J.R.R. Tolkien,
Fran Walsh
Elijah Wood, Viggo
Mortensen
28. Nattiya Kanhabua 28Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends”
– Users with similar tastes tend to have also a similar taste
• Basic approach
– Use a matrix of user-item ratings (explicit or implicit)
Collaborative filtering (CF)
29. Nattiya Kanhabua 29Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends”
– Users with similar tastes tend to have also a similar taste
• Basic approach
– Use a matrix of user-item ratings (explicit or implicit)
Collaborative filtering (CF)
Implicit rating
- Clicks
- Page views
- Time spent on a page
30. Nattiya Kanhabua 30Trial lecture
• Basic idea [Balabanovic 1997]
– Give me “popular items among my friends”
– Users with similar tastes tend to have also a similar taste
• Basic approach
– Use a matrix of user-item ratings (explicit or implicit)
– Predict a rating for an unseen item
Collaborative filtering (CF)
31. Nattiya Kanhabua 31Trial lecture
• Given the active user and a matrix of user-item ratings
• Goal: predict a rating for an unseen item by
1. Find a set of users (neighbors) with similar ratings
2. Estimate John’s rating of Item5 from neighbors’ ratings
3. Repeat for all unseen items and recommend top-N items
User-based nearest-neighbor CF
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
32. Nattiya Kanhabua 32Trial lecture
• Measure user similarity, e.g., Pearson correlation
– a, b : users
– ra,p : rating of a for item p, , = users’ averaged ratings
– P : set of items, rated by both a and b
Find neighbors
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
sim = 0.85
sim = 0.70
sim = -0.79
33. Nattiya Kanhabua 33Trial lecture
• Prediction function
– Combine the rating differences
– Use the user similarity as a weight
Estimate a rating
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 4.87
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
sim = 0.85
sim = 0.70
34. Nattiya Kanhabua 34Trial lecture
• Basic idea
– Use the similarity between items (instead of users)
– Item-item similarity can computed offline
• Example
– Look for items that are similar to Item5, or neighbors
– Predict the rating of Item5 using John's ratings of neighbors
Item-based nearest-neighbor CF
Item1 Item2 Item3 Item4 Item5
John 5 3 4 4 ?
User1 3 1 2 3 3
User2
4 3 4 3 5
User3 1 5 5 2 1
35. Nattiya Kanhabua 35Trial lecture
• Sparse data
– Users do not rate many items
• Cold start
– No rating for new users or new items
• Scaling problem
– Millions of users and thousands of items
– m = #users and n = #items
– User-based CF
• Space complexity O(m2) when pre-computed
• Time complexity for computing Pearson O(m2n)
– Item-based CF
• Space complexity is reduced to O(n2)
Problems of CF
36. Nattiya Kanhabua 36Trial lecture
• How to solve the sparse data problem?
– Ask users to rate a set of items
– Use other methods in the beginning
• E.g., content-based, or non-personalized
• How to solve the scaling problem?
– Apply dimensionality reduction
• E.g. matrix factorization
Possible solutions
37. Nattiya Kanhabua 37Trial lecture
• Basic idea [Koren 2008]
– Determine latent factors from ratings
• E.g., types of movies (drama or action)
– Recommend items from the determined types
• Approach
– Apply dimensionality reduction
• E.g., Singular value decomposition (SVD) [Deerwester 1990]
Matrix factorization
38. Nattiya Kanhabua 38Trial lecture
• Basic idea
– Different approaches have their shortcomings
– Hybrid: combine different approaches
• Approach
1. Pipelined hybridization
• Use content-based to fill up entries, then use CF [Melville 2002]
Hybrid recommendation
39. Nattiya Kanhabua 39Trial lecture
• Basic idea
– Different approaches have their shortcomings
– Hybrid: combine different approaches
• Approach
1. Pipelined hybridization
• Use content-based to fill up entries, then use CF [Melville 2002]
2. Parallel hybridization
• Feature combination: ratings, user preferences and constraints
Hybrid recommendation
40. Nattiya Kanhabua 40Trial lecture
• Temporal dynamics of recommender systems
– Items has short lifetimes, i.e., dynamic set of items
– User behaviors depend on moods or time periods
– Attention to breaking news stories decay over time
– Challenge: how to capture /model temporal dynamics?
• TimeSVD++ [Koren 2009]
• Tensor factorization [Xiong 2010]
• Temporal diversity [Lathia 2010]
Future directions
41. Nattiya Kanhabua 41Trial lecture
• Group recommendations [McCarthy 2006]
– Recommendations for a group of users or friends
– Challenge: how to model group preference?
• Context-aware recommendations [Adomavicius 2011]
– Context, e.g., demographics, interests, time and place,
moods, weather, so on
– Challenge: how to combine different context?
Future directions (cont’)
42. Nattiya Kanhabua 42Trial lecture
1. Exploratory Search
– Users perform information seeking
• E.g., collection browsing or visualization
– Human-computer interaction
2. Serendipitous IR
– Systems predict/suggest interesting information
• E.g., recommender systems
– Asynchronous manner
Conclusions
43. Nattiya Kanhabua 43Trial lecture
• [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for
personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003.
• [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In
Proceedings of SIGIR, p. 594, 2004.
• [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The
Information Retrieval Series, Springer-Verlag, New York, 2005.
• [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a
case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011.
• [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR,
pp. 377-384, 2004.
• [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University
Press, New York, NY, USA, 2008.
• [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York
Times. In HCIR Workshop, 2010.
• [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM,
49(4), pp. 41-46, 2006.
• [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of
CIKM, pp. 233-242, 2007.
• [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008.
• [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009.
• [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at
internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007.
• [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to
special section. Communications of the ACM, 49(4), pp. 36-39, 2006
• [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating
exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006.
• [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool
Publishers, 2009.
References
44. Nattiya Kanhabua 44Trial lecture
• [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010.
• [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of
the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005
• [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems
Handbook, pp. 217-253, 2011.
• [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances,
patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994.
• [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of
ACM 40(3), pp. 66-72, 1997.
• [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent
Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990.
• [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge
University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering
model. In Proceedings of KDD, pp. 426-434, 2008.
• [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009.
• [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings
of SIGIR, pp. 210-217, 2010.
• [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a
critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006.
• [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved
Recommendations. In Proceedings of AAAI, pp. 187-192, 2002.
• [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with
Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010.
References (con’t)