• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Is this Entitity Relevant to your Needs - CIKM2012
 

Is this Entitity Relevant to your Needs - CIKM2012

on

  • 1,340 views

 

Statistics

Views

Total Views
1,340
Views on SlideShare
1,340
Embed Views
0

Actions

Likes
2
Downloads
33
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Is this Entitity Relevant to your Needs - CIKM2012 Is this Entitity Relevant to your Needs - CIKM2012 Presentation Transcript

    • Is This Entity Relevant to Your Needs? David Carmel IBM Research - Haifa, IsraelIBM Research - Haifa © 2012 IBM Corporation
    • IBM Research - HaifaOutline Some Open Questions in Entity Oriented Search (EoS) What makes an entity relevant to the user needs? Is it the same relevance that the IR community deals with Can we adopt exiting IR models into this new area The classical model of relevance in IR User based relevance Topical based relevance (Aboutness) Similarity based relevance measurements Supportive evidence as indication of relevancy For Q&A For EoS Relevance Estimation approaches for EOS Exploration & Discovery in EoS Summary2 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Entity Oriented Search (EoS) When people use retrieval systems they are often not searching for documents or text passages Often named entities play a central role in answering such information needs persons, organizations, locations, products… At least 20-30% of the queries submitted to Web SE are simply name entities ~71% of Web search queries contain named entities(Named entity recognition in query, Guo et al, SIGIR09) 3 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Popular Entity Oriented Search tools Product Search On-line Shopping (books, movies, electronic devices…) Amazon, eBay… Travel (places, hotels, flights…) Yahoo! Travel, Kayak… Multi-media (Music, Video, Images…) Last.fm, YouYube, Flickr… People Search Expert Search (for a specific topic) LinkedIn, ArnetMiner… Friends (colleagues, other people with mutual interests, lost friends …) Facebook… Location Search Addresses Businesses Proximity Search (Find close sites to the current searcher’s location)4 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa5 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Expert SearchThe task: Identify people who are knowledgeable on a specific topic Find people who have skills and experience on a given topic How knowledgeable can be measured? How persons should be ranked, in response to a query, such that those with relevant expertise are ranked first? 6 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaAre those entities satisfy our needs? What makes an entity relevant to the user’s need? What is the meaning of relevance in this context? Is it the same relevance that the IR community deals with for many decades in the context of document retrieval? Can we adopt exiting IR models into this new area of Entity oriented Search in a straight forward manner? In this talk I’ll try to deal with some of those questions I’ll overview how the same questions are handled in related areas, (especially in Q&A) I’ll raise some research directions that might lead to a better understanding of the concept of relevance in EoS7 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaWhat is an Entity? Entity: an object or a “thing” that can be uniquely identified in the world An entity must be distinguished from other entities Can be anything (including an abstract thing!) Attributes: Used to describe entities An attribute contains a single piece of information Key - A minimal set of attributes that uniquely identify an entity Entity set: a set of entities of the same type and attributes id birthday Actor name address8 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaWhat is a Relationship? Relationship: Association among two or more entities A Relationship also may have attributes Relationship Set: Set of relationships of the same type code Medication name id Patient Prescription Physician id name Date9 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaExample: ERD for Social Search in the Enterprise Creator10 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaEntity Relationship Graph (ERG) Represents Entity instances as graph nodes Binary relationships as (weighted) edges N-ary relations are broken into binary ones11 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaEntity Oriented Search (EoS) Entity Relationship Entities, Relations Index Entity Relationship Data Query Examples: • Nikon D40 • Teammates of Michael Schumacher Query • “Data mining” (Free Text, Entity, Hybrid query) Runtime Related Entities, Relationships Ranking Navigation Exploration12 Is This entity Relevant? © 2012 IBM Corporation
    • The concept of Relevance in IRIBM Research - Haifa © 2012 IBM Corporation
    • IBM Research - Haifa The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96) Problem Request JudgmentP: The user has R: The user expresses J: The same userproblem to solve IN explicitly, usually Judges theor an aim to In natural language, RELEVANCE achieve (sometimes with the of search results help of an intermediary) Information Query Need IN: The user builds Q: Formalization: R is mental, implicit translated to a formal representation of P query understandable by (may be incorrect or the search system Incomplete)14 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaUser-based (Subjective) Relevance Relevance is a dynamic concept that depends on the user’s subjective judgment Subjective Relevance judgment may depend on: User’s characteristics and perceptions Gender, age, education, income, occupation… Preferences, Interests, State of mind The context of search Level of the user’s expertise (regarding the topic of interests) Current Time Current Location Session status Dependencies between retrieved items to the • specific query • sequential queries during the session15 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaTopical-based relevance judgment How well the topic of the information retrieved matches the topic of the request An object is objectively relevant to a request if it deals with the topic of the request (Aboutness) TREC working definition for relevance assessment: If you are writing a report on the topic and would use the information contained in the document in the report – then the document is considered relevant to the topic… A document is judged relevant if any piece of it is relevant regardless of how small that piece is in relation to the rest of the document16 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaProbability Ranking Principal Given a set of documents that “match” the entity-oriented query How do we rank them for the user? The Probability Ranking Principal (PRP) for Document Retrieval (Robertson 71): ``If a retrieval systems response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, The overall effectiveness of the system to its user will be the best… Pr( R = 1 | d , q) Pr( R = 1 | e, q ) We need a reliable and coherent methodology for measuring the probability of relevance of an entity to a query17 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Relevance estimation in classic Document Retrieval Most relevance approximation approaches for document retrieval are based on measuring some kind of similarity between the users query and retrieved documents Vector Space: The Cosine of the angle between two vectors Concept space: similarity in the latent concept space • e.g. LDA, LSI, ESA Language models: Similarity between the documents and the query term distributionsCan we use similar approaches for EoS? 18 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaEntity Similarity While similarity plays a central role in document retrieval for relevance estimation many relevant entities are not similar to the queried entity At least according to standard definitions of similarity This problem is well known in the Question Answering domain The answer is not necessarily “similar” to the question The supportive passage is not always similar to the question Example: Who killed JFK? John F. Kennedy (JFK), the thirty-fifth President of the United States, was assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday, November 22, 1963, in Dealey Plaza, Dallas, Texas. The ten-month investigation of the Warren Commission of 1963–1964 concluded that the President was assassinated by Lee Harvey Oswald.19 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaRelevance Judgment in Question Answering In QA we usually assume a question that identifies the information need “precisely” Who was the first American in space? How many calories are there in a Big Mac? How many Grand Slam titles did Bjorn Borg win? When an answer will be considered relevant to the question? It must be correct! i.e. it Must has supportive evidences (from reliable sources) A prominent factor in answering a question is not so much in finding an answer but in validating whether the candidate answer is correct Therefore supportive evidence is essential Assessment instructions from the TREC’s QA track: Assessors read each candidate answer and make a binary decision as to whether or not the candidate is actually an answer to the question in the context provided by the supportive document20 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa What do you mean the answer is correct?As in Document retrieval – correctness/relevance in QA might be subjective and user dependent Where is the Taj Mahal? Agra, India? The famous temple Atlantic-City, NJ? Casino? In TREC, it is common to consider each candidate answer with (relevant) supportive evidences as correct one This leads to the understanding how various candidate answers can be ranked: i.e. Relevance judgment is transformed to the judgment of the relevance of supporting evidences This approach can be applied to Entity oriented Search Rank retrieved entitles according to the amount and quality of their supportive evidences! Entity Ranking should be based on the supportive evidences for their relevance to the query 21 Is This entity Relevant? © 2012 IBM Corporation
    • Relevance Estimation Approaches for EoSIBM Research - Haifa © 2012 IBM Corporation
    • IBM Research - HaifaThe Expert Profile based Approach (Craswell et all 2001): Represent each person by a virtual document (a profile) Employee directory (in the enterprise) Concatenating all existing passages mentioning the person Rank those profiles according to their relevance to the query Using standard IR ranking techniques The user profile can be naturally used as supportive evidence to the user expertise Difficulties: Co-resolution and name disambiguation Privacy concerns23 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaEoS: Voting approach (Balog06, MacDonald09) Any relevant document is a “voter” for the entities it mentions / relates-to p1 d1 q d2 p2 d3 p3 Score( p, q ) = ∑ Score(d , q ) ∗ Score( p, d ) d What is the ratio behind? An entity mentioned many times in relevant (top retrieved) docs is more likely to be relevant on the given topic?24 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaRelevance Propagation (Serdyukov 2008) We should also consider entities that are indirectly related to the query Relevance is propagated through the entity relationship graph p1 d1 q d2 p2 p4 d3 p3 d4 How relevance should be propagated in the graph?25 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaProximity in the Entity Relationship Graph - Random walks Random walk approach The relationship strength between two nodes is reflected by the probability that a random surfer who starts at one node will visit the second one during the walk Justification Popular Random Walk Approaches The more paths that connect the two SimRank(u,v): entities in the graph How soon two random surfers (starting at u,v) are the higher the probability that the expected to meet at the same node surfer will visit the target entity Random walk with Restart (RWR) : The surfer has a fixed restart probability to return to The higher the relationship strength the source between the two Lazy Random Walk The surfer has a fixed probability of halting the walk at each step Effective Conductance Only simple (cycle free) paths – treating edges as resistors 26 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaMarkov Random Fields for EoS (Raviv, Carmel, Kurland, 2012) Q =< {q1...qn }, T > P( E | Q) ∑ P∈{ D ,T , N } λE P( EP | Q) P27 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaMRF based Entity Document Scoring P(ED|Q) We consider three types cliques Full Independent Sequential dependent Full dependent The feature function over cliques measures how well the cliques terms represent the entity document Based on Dirichlet smoothed language model T  tf (qi , ED ) + µ ⋅ cf (qi )/ | C |  f (qi , ED ) log   | ED | + µ D   For dependent models we replace qi with #1(qi..qi+k) and #uwN({qi,.. qj}) respectively The entity document scoring function aggregates the feature functions over all clique types P( ED | Q) ∑ I ∈{T ,O ,U } λ I ED ∑ c∈I ED I f (c ) D28 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaEntity type Scoring P(ET|Q) We measure the “similarity” between the query type and the entity type  e −α d (QT , ET )  P ( ET | Q) = fT (c) log  −α d ( QT , E T )   ∑ E ∈R e    d(QT,ET) - the type distance, is domain dependent In our experiments we measured the distance in the Wikipedia category graph The minimal path length between all pairs of the query and the entity’s page categories29 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaEntity Name Scoring P(EN|Q) We measure the dependency between the query term(s) and the entity name Globally Measure the proximity between the query term(s) and the entity name in the whole collection • We use pointwise mutual information (PMI) – the likelihood of finding one term in proximity to another term Locally Measure the proximity between the query terms and the entity name in the top retrieved documents P( EN | Q) = ∑ λE X ∈A X N ∑ c∈X EN f NX (c) A = {S , T , O , U , PMI T , PMI O , PMI U }30 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Experimental Results over INEX Entity track (2007-2009) Full Independence Sequential dependence 0.4 0.40.35 0.35 0.3 0.30.25 0.25 2007 2007 0.2 2008 0.2 2008 2009 20090.15 0.15 0.1 0.10.05 0.05 0 0 S(ED) S(ED,ET) S(ED,ET,EN) INEX top S(ED) S(ED,ET) S(ED,ET,EN) INEX top Results are improved significantly Full dependence 0.4 when type and name scoring were0.35 added 0.30.25 2007 0.2 2008 2009 Final Results are superior to top INEX0.15 0.1 results at 2007,2008, and comparable0.05 to 2009 0 S(ED) S(ED,ET) S(ED,ET,EN) INEX top Dependence models have not improved over Independence model?? 31 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaExploratory EoS When only an entity is given as input, the information need is quite fuzzy Any related entity has a potential to be relevant Therefore any related entity should be retrieved! High diversity in search results (entity types, relationship types) How can we ease the user to find the most relevant answers? Iterative IR – let the user navigate and explore the ER graph Facet search: Categorize the search results according to their facets (entity types/attributes..) Let the user drill down: restrict retrieved entities to a specific facet NOTE: We still need to rank the search results in each of the facets! Graph navigation: Let the user explore the graph by using a retrieved entity as a pivot to a new search Query reformulation32 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaSearch over Social Media Data (SaND) – (Carmel 2009, Guy 2010) SaND provides social aggregation over social data SaND builds an entity-entity relationship matrix that maps a given entity to all related entities, weighted by their relationship strength Direct relations of a user to: document – as an author, tagger and commenter another user – as a friend or as a manager/employee tag – she used, or tagged by others group –as a member/owner Indirect relations: Two entities are indirectly related if both are directly related to the same entity The overall relationship strength between two entities is determined by a linear combination of their direct and indirect relationship strengths33 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Search for the term ‘social’ Related People – Ranked list of people that are related to the topic and to the result set, in one or more relationship types (author, commenter, tagger, etc.) Results contain different types of entities – Blogs, Communities, bookmarked documents etc.. Popular, higher ranked resultsRelated Tags – Ranked tag cloud for appear higher in the result set.this result set.34 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Narrowing the search to Luis Suarez’ related results Hovering over a result, highlights the related people and tags35 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Viewing results for query ‘social’ and person ‘Luis Suarez’ Viewing Luis’ business card, and results related to him36 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaSummary In this talk we raised several questions related to the concept of relevance in EoS: What makes an entity relevant to the user’s need? What is the meaning of relevance in this context? Is it the same notion of relevance used in document retrieval? We argue that the relevance of an entity can be estimated, according to supportive evidences provided by the search system We talked on EoS common retrieval techniques: Profile based approach The Voting approach Relevance propagation We discussed several examples of EoS systems and how relevance estimation can be applied in these domains We claimed that the scale and diversity of EoS search results demand Exploratory search techniques such as Facet search and Graph navigation37 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - HaifaOpen Questions and Challenges Entity Similarity While in document retrieval similarity plays a central role in relevance judgment, entity similarity measurement should still be better understood Attribute based similarity, Evidence based similarity Graph proximity Hybrid approaches The clustering hypothesis: Are two “similar” entities likely being relevant to the same information need? Challenges to what extent relevant entities are indeed similar to each other and according to which similarity measurement Relevance propagation: What relationship types provide effective relevance propagation channels? Do your friends inherit your own expertise? Which relationship types contribute to relevance propagation?38 Is This entity Relevant? © 2012 IBM Corporation
    • IBM Research - Haifa Thank You! Questions?39 Is This entity Relevant? © 2012 IBM Corporation
    • Is This Entity Relevant to Your Needs? David Carmel IBM Research - Haifa, IsraelIBM Research - Haifa © 2012 IBM Corporation