• Save
Google Kernel Function
Upcoming SlideShare
Loading in...5
×
 

Google Kernel Function

on

  • 900 views

Presentation for my UML CNIS talk Feb 13, 2009

Presentation for my UML CNIS talk Feb 13, 2009

Statistics

Views

Total Views
900
Views on SlideShare
900
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Google Kernel Function Google Kernel Function Presentation Transcript

  • Mehran Sahami Timothy D. Heilman Google Inc. Presented by Beibei Yang With credits to:Mehran Sahami, Stanford University, and Ellen Spertus, Google Inc.
  • Mehran Sahami Associate Professor, Stanford Univ., 2007— Google Inc., 2002-2007Timothy D. Heilman Sr. Sr Software Engineer, Google Inc Engineer Inc. Presented By: Beibei Yang 2/19/2009 2
  • Presented By: Beibei Yang 2/19/2009 3
  • Semantic Web It’s all about understanding!Semantic similarity A concept whereby a set of d h b f documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content.Semantic Relatedness Publicly available means for approximating the relative meaning o wo ds docu e ts. elat ve ea g of words/documents. Have been used for essay-grading by the Educational Testing Service, search engine technology, predicting which links people are likely to click on, etc. Presented By: Beibei Yang 2/19/2009 4
  • What are they?Example: Amazon Presented By: Beibei Yang 2/19/2009 5
  • “What to do when your TiVo thinks you’re What you regay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Presented By: Beibei Yang 2/19/2009 6
  • “What to do when your TiVo thinks you’re What you regay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Presented By: Beibei Yang 2/19/2009 7
  • “What to do when your TiVo thinks you’re What you regay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Presented By: Beibei Yang 2/19/2009 8
  • Wal MartWal-Mart DVD recommendations http://tinyurl.com/2gp2hm Presented By: Beibei Yang 2/19/2009 9
  • Wal MartWal-Mart DVD recommendations http://tinyurl.com/2gp2hm Presented By: Beibei Yang 2/19/2009 10
  • Wal MartWal-Mart DVD recommendations http://tinyurl.com/2gp2hm Presented By: Beibei Yang 2/19/2009 11
  • It sIt’s the degree to which text passages havethe same meaning.Quite often we want to find how similar twoshort text snippets are: Search engine queries Course d C description i ti Policies of two insurance company You name it! Presented By: Beibei Yang 2/19/2009 12
  • Presented By: Beibei Yang 2/19/2009 13
  • The simplest way to calculate similarity of twowords is to find the minimum length of pathconnecting these two. (Has its limitations.) For example: Similarity(boy,girl) = 4 Similarity(boy,teacher) = 6Fig 1: An ISA hierarchical semantic knowledge base Presented By: Beibei Yang 2/19/2009 14
  • Search Engine Sahami and Heilmans web-based kernel function. Bollegala, Matsuo, and Ishizukas algorithm using page counts and text snippets. Iosif and Potamianoss two metric based approach Potamianos s two-metric approach. Liu and Birnbaums approach using Google Directory.WordNet Varelas et al s ontology mapping approach al. s approach. Yang and Powerss two-variant based approach: bidirectional depth-limit search (BDLS) and unidirectional breadth-first search (UBFS)Text Corpus Islam and Inkpens modified LCS (Longest Common Subsequence) string-matching algorithm.Others Li et al.s approach using multiple information sources. Presented By: Beibei Yang 2/19/2009 15
  • Chris Buckley, 1994 Buckley, C., Salton, G., Allan, J., and Singhal, A. Automatic query expansion using smart: Trec 3. In TREC (1994), pp. 0-.The definition and emphasis changed along the way way.The process of reformulating a seed query to improveretrieval performance in information retrievaloperations. pInvolves: Finding synonyms of words, and searching for the synonyms Finding all the various morphological forms of words by stemming each word in the search query Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results Re-weighting the terms in the original query Presented By: Beibei Yang 2/19/2009 16
  • Presented By: Beibei Yang 2/19/2009 17
  • Let x represent a short text snippet, wecalculate the query expansion of x, QE(x) in thisway:1. Issue x as a query to a search engine S.2. Let R(x) be the set of (at most) n retrieved documents d1, d2, … , dn3. Compute th TFIDF t C t the term vector vi f each t for h document di R(x)4. Truncate each vector vi to include its m highest weighted terms5. Let C(x) be the centroid of the L2 normalized vectors vi:6. Let QE(x) be the L2 normalization of the centroid C(x): Presented By: Beibei Yang 2/19/2009 18
  • By G Salton and C Buckley 1988 G. C. Buckley,Weight wi,j associated with with term ti indocument dj is defined to be:tfi,j is the frequency of ti in djN is the total number of documents in thecorpusdfi is the total number of documents thatcontain ti. t i Presented By: Beibei Yang 2/19/2009 19
  • Define the semantic kernel function K as theinner product of the query expansions fortwo text snippets.Given two short text snippets x and y, wedefine the semantic similarity kernelbetween them as: K(x, y) = QE(x)·QE(y) Presented By: Beibei Yang 2/19/2009 20
  • Acronyms: Presented By: Beibei Yang 2/19/2009 21
  • Individuals and their positions: Presented By: Beibei Yang 2/19/2009 22
  • Multi facetedMulti-faceted terms: Presented By: Beibei Yang 2/19/2009 23
  • Search engine: this approach could be used g ppto generate the related query suggestions ina large-scale system.Question-answeringQuestion answering system: the questioncould be matched against a list of candidateanswers to determine which is the mostsimilar semantically. i il i llSince this kernel is not limited to use on theweb,web it can also be computed using queryexpansions generated over domain-specificcorpora in order to better capture contextualsemantics in particular domains domains. Presented By: Beibei Yang 2/19/2009 24
  • Presented By: Beibei Yang 2/19/2009 25
  • Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web. Scientific American284, 5 ( , (2001), 34-43. ),Sahami, M. and Heilman, T. D. 2006. A web-based kernel function for measuring thesimilarity of short text snippets. In Proceedings of the 15th international Conferenceon World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW 06. ACM, NewYork, NY.Buckley, C., Salton, G., Allan, J.,Buckley C Salton G Allan J and Singhal A Automatic query expansion using Singhal, A.smart: Trec 3. In TREC (1994), pp. 0-.Abhishek, V., and Hosanagar, K. Keyword generation for search engine advertisingusing semantic similarity between terms. In ICEC 07: Proceedings of the ninthinternational conference on electronic commerce (New York, NY, USA, 2007), ACMpppp. 89-94.Bollegala, D., Matsuo, Y., and Ishizuka, M. Measuring semantic similarity betweenwords using web search engines. In WWW 07: Proceedings of the 16th internationalconference on World Wide Web (New York, NY, USA, 2007), ACM, pp. 757-766.Iosif, E., and Potamianos, A. Unsupervised semantic similarity computation usingweb search engines. IEEE/WIC/ACM international conference on web intelligence(Nov. 2007), 381-387.Li, Y., Bandar, Z., and Mclean, D. An approach for measuring semantic similaritybetween words using multiple information sources. IEEE transactions on knowledgeand data engineering 15, 4 (July-Aug. 2003), 871-882. Presented By: Beibei Yang 2/19/2009 26
  • Presented By: Beibei Yang 2/19/2009 27