Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Part 4: WWW 2018 tutorial on Understanding User Needs & Tasks


Published on

WWW 2018 tutorial on Understanding User Needs & Tasks

Published in: Software
  • Be the first to comment

  • Be the first to like this

Part 4: WWW 2018 tutorial on Understanding User Needs & Tasks

  1. 1. Inferring User Tasks and Needs Rishabh Mehrotra1, Emine Yilmaz2, Ahmed Hassan Awadallah3 1Spotify, London 2University College London 3Microsoft Research
  2. 2. Outline of the Tutorial • Section 1: Introduction • Section 2: Characterizing Tasks • Section 3: Tasks Extraction Algorithms • Section 4: Task based Evaluation • Section 5: Applications
  3. 3. Section 4: Task Based Evaluation • User behavior signals • Predictive Models of SAT • Explicit Satisfaction Signals
  4. 4. Web Search is Interactive
  5. 5. Web Search is Interactive
  6. 6. Web Search is Interactive
  7. 7. What should we measure? • From Queries to Tasks – People do not come to search engines to submit queries, they come to accomplish tasks ‘‘ If you cannot measure it, you cannot improve it.’’ Lord Kelvin ‘‘ You get what you measure.’’
  8. 8. User Behavior Signals Predictive Models Explicit Satisfaction Signals
  9. 9. User Behavior • Behavioral logs are traces of human behavior seen through the lenses of a sensor • In Web search: • Queries, Clicks, mouse movements, etc. • In mobile and other devices: • Voice (acoustics) • Attention (viewport)
  10. 10. Change in Acoustics • Slower speech rate is more prevalent when ASR quality is bad • Loudness is the perception of the strength or weakness of a sound wave resulting from the amount of pressure produced • Pitch represents how high or low a sound is perceived by the human ear and is determined by a sound's frequency 0% 5% 10% 15% 20% 25% 30% 35% 40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100% % requests slower ratio r SAT ASR Quality DSAT ASR Quality [Kulkarni et al., ICASSP 2017]
  11. 11. Attention Modelling • Viewport is the portion of the page that is visible on the screen • There is a high correlation between gaze time and viewport time on Mobile devices [Lagun et al., SIGIR’14]
  12. 12. Tasks as a Trail ENDtataSTARTG nn ,,,.......,, 11= Goal 1: Q 4s RL 1s SR 53s SR 118s END Goal 2: Q 3s Q 5s SR 10s AD 44s END Goal 3: Q 4s RL 1s SR 53s SR 118s END • A user search task can be represented by: • An ordered sequence of actions • Time between actions
  13. 13. User Behavior Signals Predictive Models Explicit Satisfaction Signals
  14. 14. Modeling action sequences: Markov Model • Learn patterns of action sequences that lead to satisfaction/dissatisfaction • A mixture model for generating behavior trails with two mixture components corresponding to satisfaction and dissatisfaction [Hassan et al., WSDM 2010]
  15. 15. • Accuracy: Much better than baselines on labeled data • Sensitivity: Much better than existing metrics for A/B testing [Hassan et al., WSDM 2010] Modeling action sequences: Markov Model
  16. 16. Modeling action sequences: CRF [Ageev et al., SIGIR 2011]
  17. 17. Modeling action sequences: Markov Model [Ageev et al., SIGIR 2011]
  18. 18. Semi-supervised Model • Can we learn from both labeled and unlabeled data? – Labeled data is typically limited – Unlabeled data is available at a larger scale • Generative Model + EM – E Step: Use the current classifier to estimate class probabilities for unlabeled data – M Step: Re-estimate model parameters using the labeled data and the component membership of the unlabeled data Model Expectation Step Maximization Step [Hassan, SIGIR 2012]
  19. 19. • Learning from both labeled and unlabeled data significantly improves the performance [Hassan, SIGIR 2012] Semi-supervised Model
  20. 20. Personalized Model • There are large differences between users • A one-size-fits-all model of user behavior cannot capture the variance in behavior associated with satisfaction • Making generalizations about particular behaviors is risky [Hassan and White, CIKM 2013]
  21. 21. Dialog Models: From Search to Intelligent Assistants • In intelligent assistants, we have a dialog between the user and the system • Classify every action/response to one of a set of predefined types • Extend the model to cover multiple user/system responses User Request System Response “Where is the nearest pharmacy” “Here are 8 pharmacies near you.” [show options on the screen] “Send me the directions to block sponsee” (Show me the directions to Clark’s pharmacy) “Sorry, I couldn’t find anything for ‘Send me the directions to block sponsee.’ Do you wanna search the web for it?” “No” “Here are 8 pharmacies near you.” [show options on the screen] “Directions to Clark’s pharmacy” “OK, getting you directions to Clark’s Pharmacy.” [navigation] [Jiang et al., WWW 2015]
  22. 22. • Joint training of unified Bi-LSTMs & CNN • Interaction layer: between components of intermediate representation • Softmax layer at the end for prediction Unified Multi-View Model [Mehrotra et al., CIKM 2017]
  23. 23. • Adding the auxiliary SERP level features help • Proposed unified model performs best across the board Deep multi-view model performs better than traditional sequential models! [Mehrotra et al., CIKM 2017] Unified Multi-View Model
  24. 24. User Behavior Signals Predictive Models Explicit Satisfaction Signals
  25. 25. Learn from the users • We need models that can predict user satisfaction using implicit signals from user interactions • Find correlation between implicit behavior signals and some explicit satisfaction signal
  26. 26. Explicit Satisfaction Signals • Using Judges/Annotators – Recreate the user experience and ask an annotator to assess user satisfaction [Jiang et al., WSDM 2015]
  27. 27. • Lab Studies – A group of people who are asked to perform certain tasks and can be subsequently interviewed to get feedback on satisfaction – Gamification [Ageev et al., SIGIR 2011] Explicit Satisfaction Signals
  28. 28. • Field Studies – Richer Client Instrumentation (e.g. Toolbars) – Users install a special software that monitors their tasks and collects feedback from them at specific points – Example: Curious Browser, Search TrailBlazer, SearchVote, etc. [Fox et al., TOIS 2005; Hassan et al., CIKM 2011] Explicit Satisfaction Signals
  29. 29. • Data gathering A/B tests – Run an A/B where the control group is subjected to a degraded experience 50% Users 50% Users [Machmouchi et al., CIKM 2017] Explicit Satisfaction Signals
  30. 30. Summary: Section IV • User behavior signals – Acoustics for voice interactions – Attention modeling in viewport – Action sequences • Predictive Models of SAT – Markov model for action sequences – CRF models – Semi-supervised model – Deep sequential model for task SAT • Explicit Satisfaction Signals – Judges/annotators – Lab studies – Field studies – Data gathering A/B tests