Mehran Sahami        Timothy D. Heilman                Google Inc.          Presented by Beibei Yang                   Wit...
Mehran Sahami Associate Professor, Stanford Univ., 2007— Google Inc., 2002-2007Timothy D. Heilman Sr. Sr Software Engineer...
Presented By: Beibei Yang   2/19/2009   3
Semantic Web It’s all about understanding!Semantic similarity A concept whereby a set of d             h b             f d...
What are they?Example: Amazon         Presented By: Beibei Yang   2/19/2009   5
“What to do when your TiVo thinks you’re What                              you regay”, Wall Street Journal, Nov. 26, 2002 ...
“What to do when your TiVo thinks you’re What                              you regay”, Wall Street Journal, Nov. 26, 2002 ...
“What to do when your TiVo thinks you’re What                              you regay”, Wall Street Journal, Nov. 26, 2002 ...
Wal MartWal-Mart DVD recommendations        http://tinyurl.com/2gp2hm          Presented By: Beibei Yang   2/19/2009   9
Wal MartWal-Mart DVD recommendations        http://tinyurl.com/2gp2hm          Presented By: Beibei Yang   2/19/2009   10
Wal MartWal-Mart DVD recommendations        http://tinyurl.com/2gp2hm          Presented By: Beibei Yang   2/19/2009   11
It sIt’s the degree to which text passages havethe same meaning.Quite often we want to find how similar twoshort text snip...
Presented By: Beibei Yang   2/19/2009   13
The simplest way to calculate similarity of twowords is to find the minimum length of pathconnecting these two. (Has its l...
Search Engine Sahami and Heilmans web-based kernel function. Bollegala, Matsuo, and Ishizukas algorithm using page counts ...
Chris Buckley, 1994 Buckley, C., Salton, G., Allan, J., and Singhal, A. Automatic query expansion using smart: Trec 3. In ...
Presented By: Beibei Yang   2/19/2009   17
Let x represent a short text snippet, wecalculate the query expansion of x, QE(x) in thisway:1.   Issue x as a query to a ...
By G Salton and C Buckley 1988   G.             C. Buckley,Weight wi,j associated with with term ti indocument dj is defin...
Define the semantic kernel function K as theinner product of the query expansions fortwo text snippets.Given two short tex...
Acronyms:            Presented By: Beibei Yang   2/19/2009   21
Individuals and their positions:           Presented By: Beibei Yang   2/19/2009   22
Multi facetedMulti-faceted terms:           Presented By: Beibei Yang   2/19/2009   23
Search engine: this approach could be used          g           ppto generate the related query suggestions ina large-scal...
Presented By: Beibei Yang   2/19/2009   25
Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web. Scientific American284, 5 (    , (2001), 34-43.           ...
Presented By: Beibei Yang   2/19/2009   27
Upcoming SlideShare
Loading in...5
×

Google Kernel Function

739

Published on

Presentation for my UML CNIS talk Feb 13, 2009

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
739
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Google Kernel Function

  1. 1. Mehran Sahami Timothy D. Heilman Google Inc. Presented by Beibei Yang With credits to:Mehran Sahami, Stanford University, and Ellen Spertus, Google Inc.
  2. 2. Mehran Sahami Associate Professor, Stanford Univ., 2007— Google Inc., 2002-2007Timothy D. Heilman Sr. Sr Software Engineer, Google Inc Engineer Inc. Presented By: Beibei Yang 2/19/2009 2
  3. 3. Presented By: Beibei Yang 2/19/2009 3
  4. 4. Semantic Web It’s all about understanding!Semantic similarity A concept whereby a set of d h b f documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content.Semantic Relatedness Publicly available means for approximating the relative meaning o wo ds docu e ts. elat ve ea g of words/documents. Have been used for essay-grading by the Educational Testing Service, search engine technology, predicting which links people are likely to click on, etc. Presented By: Beibei Yang 2/19/2009 4
  5. 5. What are they?Example: Amazon Presented By: Beibei Yang 2/19/2009 5
  6. 6. “What to do when your TiVo thinks you’re What you regay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Presented By: Beibei Yang 2/19/2009 6
  7. 7. “What to do when your TiVo thinks you’re What you regay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Presented By: Beibei Yang 2/19/2009 7
  8. 8. “What to do when your TiVo thinks you’re What you regay”, Wall Street Journal, Nov. 26, 2002 http://tinyurl.com/2qyepg Presented By: Beibei Yang 2/19/2009 8
  9. 9. Wal MartWal-Mart DVD recommendations http://tinyurl.com/2gp2hm Presented By: Beibei Yang 2/19/2009 9
  10. 10. Wal MartWal-Mart DVD recommendations http://tinyurl.com/2gp2hm Presented By: Beibei Yang 2/19/2009 10
  11. 11. Wal MartWal-Mart DVD recommendations http://tinyurl.com/2gp2hm Presented By: Beibei Yang 2/19/2009 11
  12. 12. It sIt’s the degree to which text passages havethe same meaning.Quite often we want to find how similar twoshort text snippets are: Search engine queries Course d C description i ti Policies of two insurance company You name it! Presented By: Beibei Yang 2/19/2009 12
  13. 13. Presented By: Beibei Yang 2/19/2009 13
  14. 14. The simplest way to calculate similarity of twowords is to find the minimum length of pathconnecting these two. (Has its limitations.) For example: Similarity(boy,girl) = 4 Similarity(boy,teacher) = 6Fig 1: An ISA hierarchical semantic knowledge base Presented By: Beibei Yang 2/19/2009 14
  15. 15. Search Engine Sahami and Heilmans web-based kernel function. Bollegala, Matsuo, and Ishizukas algorithm using page counts and text snippets. Iosif and Potamianoss two metric based approach Potamianos s two-metric approach. Liu and Birnbaums approach using Google Directory.WordNet Varelas et al s ontology mapping approach al. s approach. Yang and Powerss two-variant based approach: bidirectional depth-limit search (BDLS) and unidirectional breadth-first search (UBFS)Text Corpus Islam and Inkpens modified LCS (Longest Common Subsequence) string-matching algorithm.Others Li et al.s approach using multiple information sources. Presented By: Beibei Yang 2/19/2009 15
  16. 16. Chris Buckley, 1994 Buckley, C., Salton, G., Allan, J., and Singhal, A. Automatic query expansion using smart: Trec 3. In TREC (1994), pp. 0-.The definition and emphasis changed along the way way.The process of reformulating a seed query to improveretrieval performance in information retrievaloperations. pInvolves: Finding synonyms of words, and searching for the synonyms Finding all the various morphological forms of words by stemming each word in the search query Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results Re-weighting the terms in the original query Presented By: Beibei Yang 2/19/2009 16
  17. 17. Presented By: Beibei Yang 2/19/2009 17
  18. 18. Let x represent a short text snippet, wecalculate the query expansion of x, QE(x) in thisway:1. Issue x as a query to a search engine S.2. Let R(x) be the set of (at most) n retrieved documents d1, d2, … , dn3. Compute th TFIDF t C t the term vector vi f each t for h document di R(x)4. Truncate each vector vi to include its m highest weighted terms5. Let C(x) be the centroid of the L2 normalized vectors vi:6. Let QE(x) be the L2 normalization of the centroid C(x): Presented By: Beibei Yang 2/19/2009 18
  19. 19. By G Salton and C Buckley 1988 G. C. Buckley,Weight wi,j associated with with term ti indocument dj is defined to be:tfi,j is the frequency of ti in djN is the total number of documents in thecorpusdfi is the total number of documents thatcontain ti. t i Presented By: Beibei Yang 2/19/2009 19
  20. 20. Define the semantic kernel function K as theinner product of the query expansions fortwo text snippets.Given two short text snippets x and y, wedefine the semantic similarity kernelbetween them as: K(x, y) = QE(x)·QE(y) Presented By: Beibei Yang 2/19/2009 20
  21. 21. Acronyms: Presented By: Beibei Yang 2/19/2009 21
  22. 22. Individuals and their positions: Presented By: Beibei Yang 2/19/2009 22
  23. 23. Multi facetedMulti-faceted terms: Presented By: Beibei Yang 2/19/2009 23
  24. 24. Search engine: this approach could be used g ppto generate the related query suggestions ina large-scale system.Question-answeringQuestion answering system: the questioncould be matched against a list of candidateanswers to determine which is the mostsimilar semantically. i il i llSince this kernel is not limited to use on theweb,web it can also be computed using queryexpansions generated over domain-specificcorpora in order to better capture contextualsemantics in particular domains domains. Presented By: Beibei Yang 2/19/2009 24
  25. 25. Presented By: Beibei Yang 2/19/2009 25
  26. 26. Berners-Lee, T., Hendler, J., and Lassila, O. The Semantic Web. Scientific American284, 5 ( , (2001), 34-43. ),Sahami, M. and Heilman, T. D. 2006. A web-based kernel function for measuring thesimilarity of short text snippets. In Proceedings of the 15th international Conferenceon World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW 06. ACM, NewYork, NY.Buckley, C., Salton, G., Allan, J.,Buckley C Salton G Allan J and Singhal A Automatic query expansion using Singhal, A.smart: Trec 3. In TREC (1994), pp. 0-.Abhishek, V., and Hosanagar, K. Keyword generation for search engine advertisingusing semantic similarity between terms. In ICEC 07: Proceedings of the ninthinternational conference on electronic commerce (New York, NY, USA, 2007), ACMpppp. 89-94.Bollegala, D., Matsuo, Y., and Ishizuka, M. Measuring semantic similarity betweenwords using web search engines. In WWW 07: Proceedings of the 16th internationalconference on World Wide Web (New York, NY, USA, 2007), ACM, pp. 757-766.Iosif, E., and Potamianos, A. Unsupervised semantic similarity computation usingweb search engines. IEEE/WIC/ACM international conference on web intelligence(Nov. 2007), 381-387.Li, Y., Bandar, Z., and Mclean, D. An approach for measuring semantic similaritybetween words using multiple information sources. IEEE transactions on knowledgeand data engineering 15, 4 (July-Aug. 2003), 871-882. Presented By: Beibei Yang 2/19/2009 26
  27. 27. Presented By: Beibei Yang 2/19/2009 27

×