Personalization in Information Retrieval,
Extraction and Access
Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008

                       Vasudeva Varma
                       www.iiit.ac.in/~vasu
Search Engine Heat is On!
    2


         Applications of Search Technologies
          Web search
          Product search
          Service search
          Domain Search
         Already a BIG Market
         HUGE Opportunity


             IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
2
Agenda
    3


         Evolution of Search Engines
         Information Retrieval Vs. Extraction Vs. Access
         Personalization in IR, IE and IA
         Applications in Personalized IA
         Conclusions




              IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
3
Evolution of Search Engines
    4


         Crawling and Indexing
         Topic directories
         Clustering and Classification
         Hyperlink analysis
         Resource discovery and vertical portals
         Semantic Web
         ???


              IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
4
Current IR engines fail – why?
    5


          Wide variation in retrieval results
            User topic
            Retrieval system
          Different approaches work for different systems.
          No way to determine which approach will work for
          a particular query.

        Solution:
          Deeper analysis of the content and Query

               IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
5
Motivation for Deeper Analysis
    6


         Texts are one of the major sources of
         information and knowledge.
         However, they are not transparent.

         They have to be systematically integrated with
         the other sources like data bases, numerical data,
         etc.


                    NLP/IR/IE for better analysis
                     IA for better presentation
                      IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
                                                                            5/30/2008
6
Agenda
    7


         Evolution of Search Engines
         Information Retrieval Vs. Extraction Vs. Access
         Personalization in IR, IE and IA
         Applications in Personalized IA
         Conclusions




              IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
7
IR vs. IE vs. IA
    8

        To search and retrieve documents in response to queries for
        information

Vs.

        To extract information that fits pre-defined database schemas or
        templates, specifying the output formats

Vs.

        To make the required information accessible to the user in their
        choice of language, mode, level of detail and format
                   IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
8
Characterization of Texts



                                               IR System
                                                                          Queries




Collection of Texts

9           IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
Knowledge
                        Characterization of Texts                                              Interpretation


                                               IR System
                                                                          Queries




Collection of Texts

10          IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
Knowledge
                        Characterization of Texts                                              Interpretation

                                                Passage
                                               IR System                  Queries




Collection of Texts

11          IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
Knowledge
                        Characterization of Texts
                                                                                               Interpretation
                                                Passage                                 IE System
                                               IR System                  Queries




                                                                                   Structures
                                                                                       of
                                                                                   Sentences
Collection of Texts                                                                   NLP

                                                                  Texts                           Templates
12          IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
Information Access
                                             Knowledge
                                                                                       Technologies
                                                   Interpretation                         Machine
                                                                                         Translation
                   Passage                       IE System
                  IR System

                                                                                        Summarization

                                                                                              I
                                                                                           Snippet
                                                                                          Generation


                                                                                        NL Generation


                                                                                         Visualization
                                                                                             Tools


13   IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
Agenda
 14


       Evolution of Search Engines
       Information Retrieval Vs. Extraction Vs. Access
       Personalization in IR, IE and IA
       Applications in Personalized IA
       Conclusions




            IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
14
Limitations of Current IR Systems
 15


       All users get same results for a given query –
       independent of:
         Previous search history
         Current Search Context
       Treat all users the same
       Does one size fits all?




            IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
15
Personalized Web Search
 16


       Automatic adjustment of information content, structure, and
       presentation tailored to an individual user.
       Characteristics: Age, Gender, Special Interest Groups, Topic
       Personalize Search Results using
         Personal content
         Past Activities (long term and short term)
       Variations:
         Explicit or Implicit profile setup
         Explicit or Implicit relevance feedback
         Client side or server side storage of information (privacy implications)
         User control over amount of personalization


             IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
16
Overview of Personalized Search
 17


      Typically a 3 step process:
      1. Obtain results (n>>10)
      2. Computer Similarity (results, User)
      3. Re-rank the results




             IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
17
18




      IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
18
19




      IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
19
Techniques
 20


       Co-active Techniques
       Pro-active Techniques
       Collaborative Filtering
       User Profile based Result Pruning
       User Profile based Query Expansion




           IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
20
Problem Description
      Personalized Search - Issues
        What to use to Personalize?
        How to Personalize?
        When not to Personalize?
        How to know Personalization helped?




21
Problem Description
      We focus on the issue How to Personalize?
      Problem Statement
        How to learn to personalize for future searches using
        past search history
          How to model and represent past search contexts
          How to use it to improve search results




22
Solution - Outline

      Model and Represent past user feedback – Learning
      user profile
        Use implicit feedback
        Long term learning
        User contexts – triples
              {user,query,{relevant documents}}
      Improve Search Results – Reranking
        Get Initial Search results
        Take top few and rescore using user profile and rearrange


23
Contributions
      I Search : A suite of approaches for Personalized
      Web Search
        Proposed Personalized search approaches
        Baseline
        Basic Retrieval methods
        Automatic Evaluation
      Analysis of Query Log



24
Review of Personalized Search

                                     Personalized Search




     Query logs   Machine learning      Language modeling   Community based   Others




25
I Search : A suite of Techniques for
     Personalized IR

      Suite of Approaches???
        Statistical Language modeling based approaches
          Simple N-gram based methods
          Noisy Channel Model based method
        Machine learning based approach
          Ranking SVM based method
        Personalization without relevance feedback
          Simple N-gram based method




26
Statistical Language Modeling based
     Approaches:Overview
        From user contexts, capture statistical properties
        of texts
        Use the same to improve search results
        Different Contexts
          Unigram and Bigrams
             Simple N-gram based approaches
          Relationship between query and document words

             Noisy Channel based approach



27
Simple N-gram based approaches
      N-gram : general term for words
        1-gram : unigram, 2-gram : bigram
      Capture statistical properties of text
        Single words (Unigrams)
        Two adjacent words (Bigrams)




28
Learning user profile
     Given Past search history
     Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)}
       rfall = contentation of all rf
       For each unigram wi



       User profile


29
Sample user profile




30
Reranking
      In general LM for IR




      Our Approach




31
Noisy Channel based Approach
      Documents and Queries different information spaces
      Queries – short, concise
      Documents – more descriptive
      Most methods to retrieval or personalized web
      search do not model this
      Capture relationship between query and document
      words



32
Machine Learning based
     Approaches:Introduction
      Most machine learning for IR - Binary classification
      problem – “relevant” and “non-relevant”
      Click through data
        Click is not an absolute relevance but relative relevance
          i.e., assuming clicked – relevant, un clicked - irrelevant
          is wrong.
        Clicks – biased
        Partial relative relevance - Clicked documents are more
        relevant than the un clicked documents.



33
Personalized Search without Relevance
     Feedback:Introduction

      Can personalized be done without relevance
      feedback about which documents are relevant
      How much informative are the queries posed by
      users
      Is information contained in the queries enough to
      personalize?




34
Approach
      Past queries of the user available
      Make effective use of past queries
      Simple N-gram based approach




35
Experiment Results
      Language Modeling – Best Results!
        Interesting framework Personalized Search
        Simple N-gram based approaches also worked well
        Noisy Channel model worked best
          Extracting Synthetic Queries helped
          Different Training schemes
               IBM Model1 Vs GIZA++
               Snippet Vs Document
      Machine Learning – competitive results
        Different Features and weights
      Without Relevance Feedback – Very encouraging results
        Simple Approach worked well
        Sparsity – Query log was useful


36
Agenda
 37


       Evolution of Search Engines
       Information Retrieval Vs. Extraction Vs. Access
       Personalization in IR, IE and IA
       Applications in Personalized IA
       Conclusions


            Personalized Search                                        Personalized
            Engine for Mobile                                          Summarization
            Phones                                                     (for Mobile Devices)

            IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
37
“Personalized” Search Engine for
     mobile devices
      To develop a “personalized” Search Engine for mobile devices
      that will produce more relevant results based on the query
      and the “context”
      What we mean by “Personalized” search?
         user will be able to configure the search interfaces (Explicit feedback)
         System will observe user behavior and customize itself to suit user’s
         needs (Implicit feedback)
      What we mean by “Context”?
         User, time, location, …
      Goal is to make Search accessible on Nokia mobile devices and make use of
      the mobile aspects for personalization.

                                                      38   (C) Vasudeva Varma, IIIT Hyderabad, India
38
Scope of the Application
       Client Side        Server Side




                           39   (C) Vasudeva Varma, IIIT Hyderabad, India
39
Problem Re-Definition
      Dynamic user behavior tracking
        An observer that keeps track of all “relevant” user actions
        Client module
      Analysis of user actions
        Interpret the user actions to derive user interests (categories of interests)
        so that more relevant results are displayed
      Construction of user profile implicitly
        Implicit Supervised learning
      Personalization
        Based on Query
        Based on User Profile
        Based on other parameters such as time, location

                                                       40   (C) Vasudeva Varma, IIIT Hyderabad, India
40
Solution Overview




                         41   (C) Vasudeva Varma, IIIT Hyderabad, India
41
Personalized Summarization:
     Motivation
      The success that search engine providers have found on the PC
      have failed to translate to the mobile phone. why?

        Because trying to force a PC-based search experience inside a mobile
        device falls short on a key area of usability

        Search queries typically return hundreds of potential hits.
           Making sense of such output is difficult.
          The results may or may not be of user interest.


      We are looking for a faster and easier way to access
      precise information on our mobile devices.

                                                        42   (C) Vasudeva Varma, IIIT Hyderabad, India
42
Challenges
      Can we offer users a more simple, friendly and
      intuitive experience?
      We are looking forward to provide more
      information with less payload in form of a summary
      which will take care of
        context
        history
        preferences
        device capabilities
        social network

                                     43   (C) Vasudeva Varma, IIIT Hyderabad, India
43
System Model




                                         Search Engine




                    44



                         (C) Vasudeva Varma, IIIT Hyderabad, India
44
Summary
 45


       Current Search Engines are inadequate and current
       know-how is only the tip of an ice-berg
       IR, IE and IA areas have enjoyed huge commercial
       success and have a huge growth potential
       Personalization is perhaps the next big wave
       Various personalization techniques are available -
       yet this is a very fertile research field
       The two personalization application shown are just
       examples of many possibilities.
            IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008
45
46   Thank You – Questions?
     Vasudeva Varma, IIIT Hyderabad
     vv@iiit.ac.in or www.iiit.ac.in/~vasu
       IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H   5/30/2008

seminar topic

  • 1.
    Personalization in InformationRetrieval, Extraction and Access Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008 Vasudeva Varma www.iiit.ac.in/~vasu
  • 2.
    Search Engine Heatis On! 2 Applications of Search Technologies Web search Product search Service search Domain Search Already a BIG Market HUGE Opportunity IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 2
  • 3.
    Agenda 3 Evolution of Search Engines Information Retrieval Vs. Extraction Vs. Access Personalization in IR, IE and IA Applications in Personalized IA Conclusions IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 3
  • 4.
    Evolution of SearchEngines 4 Crawling and Indexing Topic directories Clustering and Classification Hyperlink analysis Resource discovery and vertical portals Semantic Web ??? IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 4
  • 5.
    Current IR enginesfail – why? 5 Wide variation in retrieval results User topic Retrieval system Different approaches work for different systems. No way to determine which approach will work for a particular query. Solution: Deeper analysis of the content and Query IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 5
  • 6.
    Motivation for DeeperAnalysis 6 Texts are one of the major sources of information and knowledge. However, they are not transparent. They have to be systematically integrated with the other sources like data bases, numerical data, etc. NLP/IR/IE for better analysis IA for better presentation IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 6
  • 7.
    Agenda 7 Evolution of Search Engines Information Retrieval Vs. Extraction Vs. Access Personalization in IR, IE and IA Applications in Personalized IA Conclusions IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 7
  • 8.
    IR vs. IEvs. IA 8 To search and retrieve documents in response to queries for information Vs. To extract information that fits pre-defined database schemas or templates, specifying the output formats Vs. To make the required information accessible to the user in their choice of language, mode, level of detail and format IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 8
  • 9.
    Characterization of Texts IR System Queries Collection of Texts 9 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
  • 10.
    Knowledge Characterization of Texts Interpretation IR System Queries Collection of Texts 10 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
  • 11.
    Knowledge Characterization of Texts Interpretation Passage IR System Queries Collection of Texts 11 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
  • 12.
    Knowledge Characterization of Texts Interpretation Passage IE System IR System Queries Structures of Sentences Collection of Texts NLP Texts Templates 12 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
  • 13.
    Information Access Knowledge Technologies Interpretation Machine Translation Passage IE System IR System Summarization I Snippet Generation NL Generation Visualization Tools 13 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
  • 14.
    Agenda 14 Evolution of Search Engines Information Retrieval Vs. Extraction Vs. Access Personalization in IR, IE and IA Applications in Personalized IA Conclusions IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 14
  • 15.
    Limitations of CurrentIR Systems 15 All users get same results for a given query – independent of: Previous search history Current Search Context Treat all users the same Does one size fits all? IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 15
  • 16.
    Personalized Web Search 16 Automatic adjustment of information content, structure, and presentation tailored to an individual user. Characteristics: Age, Gender, Special Interest Groups, Topic Personalize Search Results using Personal content Past Activities (long term and short term) Variations: Explicit or Implicit profile setup Explicit or Implicit relevance feedback Client side or server side storage of information (privacy implications) User control over amount of personalization IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 16
  • 17.
    Overview of PersonalizedSearch 17 Typically a 3 step process: 1. Obtain results (n>>10) 2. Computer Similarity (results, User) 3. Re-rank the results IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 17
  • 18.
    18 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 18
  • 19.
    19 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 19
  • 20.
    Techniques 20 Co-active Techniques Pro-active Techniques Collaborative Filtering User Profile based Result Pruning User Profile based Query Expansion IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 20
  • 21.
    Problem Description Personalized Search - Issues What to use to Personalize? How to Personalize? When not to Personalize? How to know Personalization helped? 21
  • 22.
    Problem Description We focus on the issue How to Personalize? Problem Statement How to learn to personalize for future searches using past search history How to model and represent past search contexts How to use it to improve search results 22
  • 23.
    Solution - Outline Model and Represent past user feedback – Learning user profile Use implicit feedback Long term learning User contexts – triples {user,query,{relevant documents}} Improve Search Results – Reranking Get Initial Search results Take top few and rescore using user profile and rearrange 23
  • 24.
    Contributions I Search : A suite of approaches for Personalized Web Search Proposed Personalized search approaches Baseline Basic Retrieval methods Automatic Evaluation Analysis of Query Log 24
  • 25.
    Review of PersonalizedSearch Personalized Search Query logs Machine learning Language modeling Community based Others 25
  • 26.
    I Search :A suite of Techniques for Personalized IR Suite of Approaches??? Statistical Language modeling based approaches Simple N-gram based methods Noisy Channel Model based method Machine learning based approach Ranking SVM based method Personalization without relevance feedback Simple N-gram based method 26
  • 27.
    Statistical Language Modelingbased Approaches:Overview From user contexts, capture statistical properties of texts Use the same to improve search results Different Contexts Unigram and Bigrams Simple N-gram based approaches Relationship between query and document words Noisy Channel based approach 27
  • 28.
    Simple N-gram basedapproaches N-gram : general term for words 1-gram : unigram, 2-gram : bigram Capture statistical properties of text Single words (Unigrams) Two adjacent words (Bigrams) 28
  • 29.
    Learning user profile Given Past search history Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)} rfall = contentation of all rf For each unigram wi User profile 29
  • 30.
  • 31.
    Reranking In general LM for IR Our Approach 31
  • 32.
    Noisy Channel basedApproach Documents and Queries different information spaces Queries – short, concise Documents – more descriptive Most methods to retrieval or personalized web search do not model this Capture relationship between query and document words 32
  • 33.
    Machine Learning based Approaches:Introduction Most machine learning for IR - Binary classification problem – “relevant” and “non-relevant” Click through data Click is not an absolute relevance but relative relevance i.e., assuming clicked – relevant, un clicked - irrelevant is wrong. Clicks – biased Partial relative relevance - Clicked documents are more relevant than the un clicked documents. 33
  • 34.
    Personalized Search withoutRelevance Feedback:Introduction Can personalized be done without relevance feedback about which documents are relevant How much informative are the queries posed by users Is information contained in the queries enough to personalize? 34
  • 35.
    Approach Past queries of the user available Make effective use of past queries Simple N-gram based approach 35
  • 36.
    Experiment Results Language Modeling – Best Results! Interesting framework Personalized Search Simple N-gram based approaches also worked well Noisy Channel model worked best Extracting Synthetic Queries helped Different Training schemes IBM Model1 Vs GIZA++ Snippet Vs Document Machine Learning – competitive results Different Features and weights Without Relevance Feedback – Very encouraging results Simple Approach worked well Sparsity – Query log was useful 36
  • 37.
    Agenda 37 Evolution of Search Engines Information Retrieval Vs. Extraction Vs. Access Personalization in IR, IE and IA Applications in Personalized IA Conclusions Personalized Search Personalized Engine for Mobile Summarization Phones (for Mobile Devices) IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 37
  • 38.
    “Personalized” Search Enginefor mobile devices To develop a “personalized” Search Engine for mobile devices that will produce more relevant results based on the query and the “context” What we mean by “Personalized” search? user will be able to configure the search interfaces (Explicit feedback) System will observe user behavior and customize itself to suit user’s needs (Implicit feedback) What we mean by “Context”? User, time, location, … Goal is to make Search accessible on Nokia mobile devices and make use of the mobile aspects for personalization. 38 (C) Vasudeva Varma, IIIT Hyderabad, India 38
  • 39.
    Scope of theApplication Client Side Server Side 39 (C) Vasudeva Varma, IIIT Hyderabad, India 39
  • 40.
    Problem Re-Definition Dynamic user behavior tracking An observer that keeps track of all “relevant” user actions Client module Analysis of user actions Interpret the user actions to derive user interests (categories of interests) so that more relevant results are displayed Construction of user profile implicitly Implicit Supervised learning Personalization Based on Query Based on User Profile Based on other parameters such as time, location 40 (C) Vasudeva Varma, IIIT Hyderabad, India 40
  • 41.
    Solution Overview 41 (C) Vasudeva Varma, IIIT Hyderabad, India 41
  • 42.
    Personalized Summarization: Motivation The success that search engine providers have found on the PC have failed to translate to the mobile phone. why? Because trying to force a PC-based search experience inside a mobile device falls short on a key area of usability Search queries typically return hundreds of potential hits. Making sense of such output is difficult. The results may or may not be of user interest. We are looking for a faster and easier way to access precise information on our mobile devices. 42 (C) Vasudeva Varma, IIIT Hyderabad, India 42
  • 43.
    Challenges Can we offer users a more simple, friendly and intuitive experience? We are looking forward to provide more information with less payload in form of a summary which will take care of context history preferences device capabilities social network 43 (C) Vasudeva Varma, IIIT Hyderabad, India 43
  • 44.
    System Model Search Engine 44 (C) Vasudeva Varma, IIIT Hyderabad, India 44
  • 45.
    Summary 45 Current Search Engines are inadequate and current know-how is only the tip of an ice-berg IR, IE and IA areas have enjoyed huge commercial success and have a huge growth potential Personalization is perhaps the next big wave Various personalization techniques are available - yet this is a very fertile research field The two personalization application shown are just examples of many possibilities. IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008 45
  • 46.
    46 Thank You – Questions? Vasudeva Varma, IIIT Hyderabad vv@iiit.ac.in or www.iiit.ac.in/~vasu IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008