The User at the Wheel of the Online Video Search Engine

749 views
689 views

Published on

The recent increase in the volume and variety of video content available online presents growing challenges for video search. Users face increased difficulty in formulating effective queries and search engines must deploy highly effective algorithms to provide relevant results. This talk addresses these challenges by introducing two novel frameworks and approaches. First, we discuss a principled framework for multimedia retrieval that moves beyond 'what' users are searching for also to encompass 'why' they search. This 'why' is understood as the reason, purpose or immediate goal behind a user information need, which is identified as the underlying 'user intent'. We identify useful intent categories for online video search, present validation experiments showing that these categories display enough invariance to be successfully modeled by a video search engine and demonstrate the potential for these categories to improve video retrieval with a large crowdsourcing user study. Second, we present a novel approach able to predict for which queries results optimization is most useful, i.e., predicting which queries will fail in the search session of a user on a video search engine. Being able to predict when a video search query would fail is likely to make the video search result optimization more efficient and deploy optimization techniques more effectively. This approach uses a combination of features derived from the search log of a video search engine (capturing users' behavior) and features derived from the video search results list (capturing the visual variance of search results), with the objective to predict whether a particular query is likely to fail in the context of a particular search session.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
749
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Approaches: or combinations of these
  • Approaches: or combinations of these
  • Approaches: or combinations of these
  • Approaches: or combinations of these
  • We don’t know when which method is good and when to deploy a particular methodUser does not get proper search results (as heard yesterday in BNI) and engine has to do unnessacerycompution which might not influence user and which is therefore senseless.
  • Approaches: or combinations of these
  • Approaches: or combinations of these
  • Approaches: or combinations of these
  • Extreme cases are well predicted by qpp because in these cases no context is necessary to make a relateively good prediction. In (almost) all of the cases, when the particular query string is submitted to a search session, indepentenly where in the search session, then it will either be successful or failed. So qpp would do a good job here. However…
  • Extreme cases are well predicted by qpp because in these cases no context is necessary to make a relateively good prediction. In (almost) all of the cases, when the particular query string is submitted to a search session, indepentenly where in the search session, then it will either be successful or failed. So qpp would do a good job here. However…
  • One source to infer context are transaction logs…
  • We looked into queries which fall in the middle category on the plot before, i.e., which have a lot of successful and failed query instances throughout different search sessions. Then we manually investigated these search session in order to infer characteristics of the user which point to success or failure of these queries, dependent on the session context.
  • In the paper we came up with 5 observations pointing to query failure. These are related to the iterative search goal development throughout the session, the satisfaction of the user with the results thus far in the session and so on. Due to time limitations, I am refering you to the paper at this point and just want to mention some features which we extracted from these observations which are indicative for query failure.general Internet browser session and search session statistics, (ii) query (re)formulation behavior and clarity of search goal expressiveness, and (iii) click-through data in the video search results lists generated by the queries in the search sessionTwo types of pre-query session historiesSession query historyQuery-specific reformulation historyFeatures are extracted from these local search session histories relative to the current queryWe do not learn user profiles or global search patterns
  • We heard yesterday in the cbir session that it is not necessarily related that the more specific the queyr, the more visually consisten the search results. So visual features give additional information next to text-based search results which could be exploited w.r.t. query failure.
  • High consistency should then indicate that the search engine has achieved good performance on the query that generated the results list.
  • Both, NSCQ and QC baselines achieve a good balance between correctly classified instances of -qif and +qif, however QC outperforms NCSQ. The relatively strong performance of the conventional QPP baseline demonstrates the potential and the strength of the text-retrieval methods to transfer to video retrieval problems. For the remainder of the experiments we compare performance against the best-performing conventional QPP baseline achieved by the query clarity score.
  • Our user indicator-based query failure prediction methods statistically significantly outperform the conventional QPP baseline (QC in Table 2) and achieve an 8% improvement in absolute performance solely by taking local search context into account. The best-performing method is the classifier built on features derived from ‘User familiarity’. Another strong performer is ‘Previous dissatisfaction’, reflecting previous failures in the session. For the observation ‘Query iterations’, using local features from the query-specific reformulation region of the search session increases the performance compared to using the entire query history results, suggesting the value of using narrow local context. The relatively poor performance achieved by observation ‘Goal-directedness’ suggests that search goal clarity evolving over a search session is not consistent. Early and late fusions perform well but do not succeed in outperforming individual well-performing observations. Looking at F-measure values of individual classes shows that classifying +qif using the proposed classifiers is more conservative than classifying –qif instances. Observations clearly achieve a much better result for –qif than for +qif. The characteristics of successful queries are presumably more stable, most likely reflecting the relatively greater stability of the characteristics of the successful query.
  • is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
  • is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
  • is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
  • is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
  • is a clear sign that the visual component of video search results should not be ignored, but rather potentially makes an important contribution to query failure prediction
  • The User at the Wheel of the Online Video Search Engine

    1. 1. The User at the Wheel of the Online Video Search Engine Christoph Kofler (c.kofler@tudelft.nl)Delft University of Technology, Delft, The Netherlands 1
    2. 2. In this talk…Two of our approaches presented at ACM Multimedia2012, Nara, Japan:1. User intent in video search2. Query failure prediction in video search sessions 2
    3. 3. I. Intent and its Discontent ACM Multimedia 2012 Brave New Ideas Work with Alan Hanjalic and Martha LarsonSlide credit: Martha Larson 3
    4. 4. An Example Information Need 4
    5. 5. Many results, but no satisfactionTop rankedresults are aboutkoi ponds, but weare discontent:There is noinformationspecifically aboutthe significance ofkoi ponds. 5
    6. 6. Many queries, no satisfaction Query suggestions Refinement strategies don’t always work.Query reformulation 6
    7. 7. Video Search Engine Workflow Information need Query Results “koi pond” list Video search engine • So what went wrong? Openclipart.org: samukunai 7
    8. 8. Improving the video search engine Flickr: Sherlock77 (James) 8
    9. 9. Improving the video search engine vehicle Flickr: Martyn @ Negaro 9
    10. 10. Video Search Engine Workflow Information need Query Results “koi pond” list Video search engine • So what went wrong? 10
    11. 11. Video Search Engine Workflow Information need Query Results “koi pond” list Video search engine • So what went wrong?We neglect the goal that the user is trying to reach… …our video search is “blind” to user intent. 11
    12. 12. User information need What Why Query Results “koi pond” list Video search engine User information need has two parts: • Topic = What the user is searching for. • Intent = Why the user is searching for it. 12
    13. 13. Removing the Intent Roadblock The main research roadblock has been the question: Which intent categories are both useful to users and technically within reach?1. Categories of Intent: Which ones are useful to users?2. Indexing Intent: Is intent technically feasible?3. Impact of Intent: Could intent prevent discontent? 13
    14. 14. 1.Categories of User Intent 14
    15. 15. Mining the Social WebYahoo! Answers 15
    16. 16. Natural Language Information Needs• We harvested natural language information needs related to video search from Yahoo! Answers.• We analyzed 281 cases in which the user has clearly stated the goal behind the information need. 16
    17. 17. User Search Intent Categories • In an iterative process, we manually clustered the information needs to identify the dominant user search intent categories (using a card-sorting methodology).Intent category DescriptionI. Information Obtain knowledge and/or gather informationII. Experience: Learning Learn something practically by experienceIII. Experience: Exposure Experience a person, place, entity or event.IV. Affect Change mood or affective state.V. Object Video is its own goal. 17
    18. 18. 2.Indexing Intent 18
    19. 19. Wider View on Video IntentSearch Intent: Creation Intent: Video Intent 19
    20. 20. Is intent within our reach? • We carry out a feasibility experiment using simple features from: • Shot patterns • Speech recognition transcripts • User-contributed metadata: title, description, tags v e r s u sInformation Intent Affect Intent 20
    21. 21. Evaluating Classifiers for Intent• Evaluate with two large sets of Internet video (from blip.tv)• Train a classifier that assigns intent categories to videos.• See paper for the experiment details; here selected results are reported for the smaller, 350 hour set. 21
    22. 22. Features from shot patterns• Shot patterns show promise.• Weighted F-measure 0.53• They are especially good in distinguishing “Information” vs. “Affect” Shot pattern from an “Information” video (correctly classified) Shot pattern from an “Affect” video (correctly classified) 22
    23. 23. Features from ASR transcripts• Speech recognition transcripts perform better (WFM 0.67)• They don‟t reach the performance of tags (WFM 0.77) “Egon comes packaged on a really nice looking blister cover that features some great super natural colors and images from the films. The back of the package features a really cool bio…” Transcript excerpt from an “Experience: Exposure” video (correctly classified) “It’s Thursday, April 10 2008. I am Robert Ellis, and this is your Thursday snack. Welcome back to political lunch. Barack Obama has painted himself in some ways,…” Transcript excerpt from an “Information” video (correctly classified) 23
    24. 24. 3.Impact of Intent 24
    25. 25. Experiment on User Perception of Intent• Workers were presented with a set of three videos returned by YouTube in response to a query.• The videos are about the same topic, i.e., “what”• We ask if the videos have the same intent, i.e., “why”.Short excerpt of the user study survey: 25
    26. 26. User Agreement on Video Intent• Setup: For each of the 883 queries, three workers filled in the survey (total 294 workers).• Results: For 55% of the queries, 2/3 workers agreed that the set contained videos representing at least two different intent categories.• Conclusions: • If online video search engines become “intent-aware”, users will indeed notice the difference. 26
    27. 27. Examples of Agreement on IntentQuery: „human metabolism Query: „motorcycle‟ glycolosis‟ Agreed on Agreed on “Experience: “Information” Learning” Agreed on Agreed on “Experience: “Information” Learning” Agreed on Agreed on “Affect” “Affect” 27
    28. 28. ∞.Conclusion and Outlook 28
    29. 29. Take-home message• Intent can help us develop video search engines that get users where they want to go.• We have removed the video search intent roadblock: We have shown which intent categories are important and that they are in reach. More challenges lie in the road ahead. 29
    30. 30. Challenge 1: Evaluating Intent• Quantifying the ability of intent to prevent discontent. “My search engine finds topics, but is it getting me where I want to go?” Flickr: sean dreilinger 30
    31. 31. Challenge 2: Isolating Intent• Addressing videos that fit multiple intents. “I‟m not relaxing, I‟m a biologist studying fish feeding habits.” 31
    32. 32. Challenge 3: Implementing Intent Query Results “koi pond” list Video search engine • Implementing intent into the video search engine workflow. “Intent fits anywhere and everywhere” 32
    33. 33. II.When Video Search Goes WrongACM Multimedia 2012 Multimedia Search and RetrievalWork with Linjun Yang, Martha Larson, Tao Mei, Alan Hanjalic, Shipeng LiDelft University of Technology, Delft, The NetherlandsMicrosoft Research Asia, Beijing, China 33
    34. 34. Searching gets complex!• Searching for videos on the Internet becomes increasingly complex• Users face increased difficulty in formulating effective and successful text-based video search queries 34
    35. 35. Searching gets complex!  35
    36. 36. Searching gets complex! Queries fail A LOT of times! 36
    37. 37. Deployment of existing algorithmsAlgorithms improving the performance of video search engineshave been developed for whole search pipeline1. Not effectively deployed2. “Expensive” for both user and search engine 37
    38. 38. How can we improve? Predicting when users will fail in their search session… …can help to more effectively deploy these algorithms Focus of this  contribution!Concept-based retrieval … Particular query suggestion  Better search results for user and “cheaper” for engine 38
    39. 39. Approach and Motivation• Context-aware Query Failure Prediction• Prediction of success or failure of a query at query time…• …within a user‟s search session with the video search enginePatterns of users’ interaction with the search engineVisual features from search results list produced by query• When does a query „fail‟?  No search results click  39
    40. 40. Terminology: Query performance prediction (QPP)• Predict retrieval performance of query • Correlates with precision • How topically coherent are search results? (clear vs. ambigious)• Statistics involve • Query string • Background collection • Search results• No search session context 40
    41. 41. Queries in Session Context   41
    42. 42. Queries in Session Context   42
    43. 43. Why QPP in Video Search is not enough: User Perspective 0.5 (Almost) all fail (Almost) all successful 0.4Frequency 0.3 0.2 0.1 0 0% 1-9% 10-19% 20-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Proportion of success rate for queries All engines YouTube Google video Bing video Yahoo! video Example: koi history: 100K submitted, 60K successful  60% success rate 43
    44. 44. Why QPP in Video Search is not enough: User Perspective 0.5 (Almost) all fail (Almost) all successful 0.4Frequency 0.3 0.2 0.1 0 0% 1-9% 10-19% 20-29% 30-39% 40-49% 50-59% 60-69% 70-79% 80-89% 90-100% Proportion of success rate for queries All engines YouTube Google video Bing video Yahoo! video Example: koi history: 100K submitted, 60K successful  60% success rateQuery performance prediction is not trivial in the majority of the cases, since query success highly depends on the query‟s context. 44
    45. 45. Video Search Transaction LogsTime Current URL Previous URL Query/Action Vertical10:46:12 …search?q= - koi documentary video koi+documentary10:46:20 …search?q= …search?q= koi history video koi+history koi+documentary10:46:25 …q=koi+history&view=detail …search?q= <results click> video &mid=E9589097DCE1DDD7D koi+history 17DE9589097DCE1DDD7D17 45
    46. 46. Context-aware Query Failure Prediction• Exploratory investigation of users’ search sessions, stored in transaction log, to find characteristics indicative for query failure• Context is derived from query‟s context within a user‟s search session 46
    47. 47. Context-aware Query Failure Prediction• Exploratory investigation of users’ search sessions, stored in transaction log, to find characteristics indicative for query failure• Context is derived from query‟s context within a user‟s search session USER FEATURES: QPP + Session Context 47
    48. 48. User Features (excerpt)• General search session statistics • Duration • Number of interactions • Search engine vertical switches• Query formulation strategies and clarity • Query reformulation types • Differences between clarity of queries within session • Overlapping query terms • Mutually exclusive query topics• Click-through data • Click behavior in search results • Dwell time on search results 48
    49. 49. Why QPP in Video Search is not enough: Engine Perspective 49
    50. 50. Context-aware Query Failure Prediction• Exploit visual information of thumbnails of produced search results list• Consistency of visual content of search results on conceptual level reflects topical focus of the results list 50
    51. 51. Context-aware Query Failure Prediction• Exploit visual information of thumbnails of produced search results list• Consistency of visual content of search results on conceptual level reflects topical focus of the results list ENGINE FEATURES: QPP + Visual Search Results 51
    52. 52. Engine Features (excerpt)• Show the potential of the visual information to be helpful for query failure prediction• Light-weight features to be • Deployed during query time • Covering the whole query space• Higher-level representations are not scalable• Video search results are represented by standard local and global features 52
    53. 53. Model Training and Prediction• Supervised learning trains generic classifiers on development set using the extracted features• One binary classifier for feature sets representing user and engine features 53
    54. 54. Offline User Training Features Feature Extraction Engine Features ModelOnline Context- Engine features Aware Prediction Q1 Q2 Q3 Q4 Feature    ? Extraction User features 54
    55. 55. Experiments 55
    56. 56. Dataset• Development set • 24K search sessions • 108K queries• Test set • 150K search sessions • 1.1M queries • 392K unique queries exclusively occur in the test set• For each query, we collected information from 25 most- relevant search results • Textual information: titles of videos • Visual information: static visual thumbnails 56
    57. 57. Baselines, Training, Evaluation• Compare against a set of query performance prediction baselines and the dominant class baseline• Ground truth from clicks in search session (from transaction log) 57
    58. 58. Performance F (q. i. F (q. i. Features WF success) failure)Best QPP baseline 0.6862 0.748 0.593Feature combination from 0.7356 0.788 0.656engine featuresFeature combination from 0.7678 0.820 0.688user featuresFeature combination from 0.7744 0.830 0.690user and engine features• Engine features: +4% improvement• User features: +8% improvement• Combined features: +9% improvement 58
    59. 59. Conclusion and Outlook 59
    60. 60. Discussion & Take home messages1. Simple visual features from search results help to extend query performance prediction• Able to outperform conventional text-only query performance prediction• Performance increase (+4%) is quite modest, but promising• Consistent with our expectations for our relatively simple visual representations• Can positively influence wrong predictions by user features- only classifiers 60
    61. 61. Discussion & Take home messages2. Features from the user context help the most for query failure predictionThree classes of query types benefited from our user features(+8%)1. User presumably wants recommendations over general results, e.g., „youtube‟2. Particular type of requested content is not available, e.g., „free movies‟3. Wrong video search engine usage (wrong vertical) or misspellings, e.g., „yahoo mail‟, „micheal jackon‟ 61
    62. 62. Discussion & Take home messages2. Features from the user context help the most for query failure prediction• „Long tail‟ queries • 36% of video queries in test set were submitted once • Contribution of session context features is independent of the frequency of query submission• Challenge: „Cold start‟ queries do not have enough session context • Only very little information is needed to address the cold start issue 62
    63. 63. Discussion & Take home messages3. Context-aware Query Failure Prediction approach is applicable using little session data• Solely focuses on local search sessions• No user profiles or global search patterns were involved in the learning process 63
    64. 64. Future Work1. Improvement of engine features using visual information from the video search results list • Higher-level representation of thumbnails • Additional sources of visual information2. Enhancing the performance of an entire range of video search engine optimization techniques3. Experimenting with additional definitions of query failure (e.g., dwell time on search results) 64
    65. 65. The User at the Wheel of the Online Video Search Engine Christoph Kofler (c.kofler@tudelft.nl)Delft University of Technology, Delft, The NetherlandsTHANK YOU FOR YOUR ATTENTION! 65

    ×