Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표


Published on

Published in: Technology, Education
  • Be the first to comment

랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표

  2. 2. 2Jin Young Kim• Graduate of SNU EE / Business• 5th Year Ph.D Student in UMass Computer Science• Starting as a Applied Researcher at Microsoft Bing
  3. 3. 3Today’s Agenda• A brief introduction of IR as a research area• An example of how we design a retrieval model• Other research projects and recent trends in IR
  4. 4. 4BACKGROUNDAn Information Retrieval Primer
  5. 5. 5Information Retrieval?• The study of how an automated system can enable its users to access, interact with, and make sense of information. Query Surface Issue Document User Visit
  6. 6. 6IR Research in Context• Situated between human interface and system / analytics research • Aims at satisfying user’s information needs • Based on large-scale system infrastructure & analytics• Need for convergence research! End-user Interface (UX / HCI / InfoViz) Information Retrieval Large-scale Large-scale (Text)Analytic System Infra. s
  7. 7. 7Major Problems in IR• Matching • (Keyword) Search : query – document • Personalized Search : (user+query) – document • Contextual Advertising : (user+context) – advertisement• Quality Query • Authority/ Spam / Freshness Surface Issue • Various ways to capture them Document User Visit• Relevance Scoring • Combination of matching and quality features • Evaluation is critical for optimal performance
  9. 9. Information seeking requires a 9communication. You need the freedom of expression. You need someone who understands.
  10. 10. Information Seeking circa 2012 10 Search engine accepts keywords only. Search engine doesn’t understand you.
  11. 11. 11Toward Humane Information Seeking Rich User Rich User Modeling Interactions Profile Search Context Browsing Behavior Filtering
  12. 12. 12 The HCIR Way:from User Modeling IR Way: Rich Query to Session Interaction USER SYSTEM Action Response UserInteraction Model Action Response History Profile Context Action Response Behavior Filtering / Browsing Filtering Conditions Relevance Feedback Related Items … … HCIR = HCI + IR
  13. 13. The Rest of Talk… 13 Personal Search Improving search and browsing for known-item finding Evaluating interactions combining search and browsing Web Search User modeling based on reading level and topic Providing non-intrusive recommendations for browsing Book Search Analyzing interactions combining search and filtering
  14. 14. 14PERSONAL SEARCHRetrieval And Evaluation Techniquesfor Personal Information [Thesis]
  15. 15. 15Example: Desktop Search Media Search over Social Evaluating Search in Ranking using Multiple Personal Social Media Document Types for Collections [WSDM12] Desktop Search [SIGIR10]
  16. 16. 16Structured Document Retrieval: Background • Field Operator / Advanced Search Interface • User’s search terms are found in multiple fields Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR11]
  17. 17. 17Structured Document Retrieval: Models• Document-based Retrieval Model f1 • Score each document as a whole f2 ...• Field-based Retrieval Model fn • Combine evidences from each field q1 q2 ... qm q1 q2 ... qm f1 f1 w1 w1 f2 f2 w2 w2 ... ... fn fn wn wn Document-based Scoring Field-based Scoring
  18. 18. 18Improved Matching for Email Search Structured Documents [CIKM09, ECIR09,12] • Field Relevance • Different field is important for different query-term 2 1 ‘registration’ is relevant when it occurs in <subject> 2 1 2 1 ‘james’ is relevant when it occurs in <to>
  19. 19. 19Estimating the Field Relevance• If User Provides Feedback • Relevant document provides sufficient information• If No Feedback is Available • Combine field-level term statistics from multiple sources from/to from/to from/to title content + title content ≅ title content Collection Top-k Docs Relevant Docs
  20. 20. 20 Retrieval Using the Field Relevance • Comparison with Previous Work q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 P(F1|q1) P(F1|qm)sum f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) multiply • Ranking in the Field Relevance Model Per-term Field Score Per-term Field Weight
  21. 21. 21Evaluating the Field Relevance Model• Retrieval Effectiveness (Metric: Mean Reciprocal Rank) DQL BM25F MFLM FRM-C FRM-T FRM-R TREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4% IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4% Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6% Fixed Field Weights Per-term Field Weights 80.0% 75.0% 70.0% 65.0% TREC 60.0% IMDB 55.0% Monster 50.0% 45.0% 40.0% DQL BM25F MFLM FRM-C FRM-T FRM-R
  22. 22. 22Evaluation Challenges for PersonalSearch [CIKM09,SIGIR10,CIKM11]• Evaluation of Personal Search • Each based on its own user study • No comparative evaluation was performed yet• Solution: Simulated Collections • Crawl CS department webpages, docs and calendars • Recruit department people for user study• Collecting User Logs • DocTrack: a human-computation search game • Probabilistic User Model: a method for user simulation
  23. 23. 23DocTrack Game Target Item Find It!
  24. 24. 24Summary so far…• Query Modeling for Structured Documents • Using the estimated field relevance improves the retrieval • User’s feedback can help personalize the field relevance• Evaluation Challenges in Personal Search • Simulation of the search task using game-like structures • Related work : ‘Find It If You Can’ [SIGIR11]
  25. 25. 25WEB SEARCHCharacterizing Web Content, User Interests, andSearch Behavior by Reading Level and Topic [WSDM12]
  26. 26. Reading level distribution varies acrossmajor topical categories
  27. 27. User Modeling by Reading Level andTopic• Reading Level and Topic • Reading Level: proficiency (comprehensibility) • Topic: topical areas of interests• Profile Construction P(R|d1)) P(R|d1 ) P(T|d1)) P(T|d1 ) P(R,T|u) P(R|d1 P(T|d1• Profile Applications • Improving personalized search ranking • Enabling expert content recommendation
  28. 28. Profile matching can predict user’spreference over search results• Metric • % of user’s preferences predicted by profile matching • Profile matching measured in KL-Divergence of RT profiles• Results • By the degree of focus in user profile • By the distance metric between user and website User Group #Clicks KLR(u,s) KLT(u,s) KLRT(u,s) ↑Focused 5,960 59.23% 60.79% 65.27% 147,195 52.25% 54.20% 54.41% ↓Diverse 197,733 52.75% 53.36% 53.63%
  29. 29. Comparing Expert vs. Non-expert URLs• Expert vs. Non-expert URLs taken from [White’09] Lower Topic Diversity Higher Reading Level
  30. 30. 30Enabling Browsing for Web Search [Work-in-progress]• SurfCanyon® • Recommend results based on clicks Initial results indicate that recommendations are useful for shopping domain.
  31. 31. 31BOOK SEARCHUnderstanding Book Search Behavior on the Web [Submitted to SIGIR12]
  32. 32. 32Understanding Book Search on theWeb• OpenLibrary • User-contributed online digital library • DataSet: 8M records from web server log
  33. 33. 33Comparison of Navigational Behavior• Users entering directly show different behaviors from users entering via web search engines Users entering the site directly Users entering via Google
  34. 34. 34Comparison of Search Behavior Rich interaction reduces the query lengths Filtering induces more interactions than search
  35. 35. 35LOOKING ONWARD
  36. 36. 36Where’s the Future? – Social Search• The New Bing Sidebar makes search a social activity.
  37. 37. 37Where’s the Future? – Semantic Search• The New Google serves ‘knowledge’ as well as docs.
  38. 38. 38Where’s the Future? – Siri-like Agent• The New Google serves ‘knowledge’ as well as docs.
  39. 39. 39Exciting Future is Awaiting US!• Recommended Readings in IR: • Any Questions ?
  40. 40. 40 Selected Publications More at @lifidea, or• Structured Document Retrieval • A Probabilistic Retrieval Model for Semi-structured Data [ECIR09] • A Field Relevance Model for Structured Document Retrieval [ECIR11]• Personal Search • Retrieval Experiments using Pseudo-Desktop Collections [CIKM09] • Ranking using Multiple Document Types in Desktop Search [SIGIR10] • Building a Semantic Representation for Personal Information [CIKM10] • Evaluating an Associative Browsing Model for Personal Info. [CIKM11] • Evaluating Search in Personal Social Media Collections [WSDM12]• Web / Book Search • Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic [WSDM12] • Understanding Book Search Behavior on the Web [In submission to SIGIR12]
  41. 41. 41My Self-tracking Efforts• Life-optimization Project (2002~2006)• LiFiDeA Project (2011-2012)
  43. 43. 43The Great Divide: IR vs. HCI IR HCI• Query / Document • User / System• Relevant Results • User Value / Satisfaction• Ranking / Suggestions • Interface / Visualization• Feature Engineering • Human-centered Design• Batch Evaluation (TREC) • User Study• SIGIR / CIKM / WSDM • CHI / UIST / CSCW Can we learn from each other?
  44. 44. 44The Great Divide: IR vs. RecSys IR RecSys• Query / Document • User / Item• Reactive (given query) • Proactive (push item)• SIGIR / CIKM / WSDM • RecSys / KDD / UMAP
  45. 45. 45The Great Divide: IR in CS vs. LIS IR in CS IR in LIS• Focus on ranking & • Focus on behavioral relevance optimization study & understanding• Batch & quantitative • User study & qualitative evaluation evaluation• SIGIR / CIKM / WSDM • ASIS&T / JCDL• UMass / CMU / • UNC / Rutgers / UW Glasgow
  46. 46. 46Problems & Techniques in IR• WhatData Format (documents, records and linked data) / Size / Dynamics (static, dynamic, streaming)User & End User (web and library)Domain Business User (legal, medical and patent) System Component (e.g., IBM Watson)Needs Known-item vs. Exploratory Search Recommendation• HowSystem Indexing and Retrieval (Platforms for Big Data Handling)Analytics Feature Extraction Retrieval Model Tuning & EvaluationPresentation User Interface Information Visualization
  47. 47. 47More about the Matching Problem• Finding Representations • Term vector vs. Term distribution • Topical category, Reading level, … Query• Estimating Representations Surface Issue • By counting terms • Using automatic classifiers Document User Visit• Calculating Matching Scores • Cosine similarity vs. KL-divergence • Combining multiple reps.