Peters matthew periodictableseo

21,239 views

Published on

Published in: Technology, Design

Peters matthew periodictableseo

  1. 1. Modern On Page Factors1SMX AdvancedMatthew Peters, PhDmatt@moz.com @mattthemathman
  2. 2. 2“philadelphiaphillies”
  3. 3. 3“philadelphiaphillies”
  4. 4. 4“Relevance” vs “Ranking”Conceptually “relevance” determination and “ranking” can be thought of a twodifferent steps (even if they are implemented as one in a search engine)
  5. 5. 5“Relevance” vs “Ranking”Conceptually “relevance” determination and “ranking” can be thought of a twodifferent steps (even if they are implemented as one in a search engine)Relevance
  6. 6. 6“Relevance” vs “Ranking”Conceptually “relevance” determination and “ranking” can be thought of a twodifferent steps (even if they are implemented as one in a search engine)RelevanceRanking12
  7. 7. 7Is this page relevant to “philadelphia phillies”?
  8. 8. 8Is this page relevant to “philadelphia phillies”?query-body similarity: 0.74
  9. 9. 9Is this page relevant to “philadelphia phillies”?query-body similarity: 0.74query-title similarity: 0.8query-H1 similarity: 1.0etc …
  10. 10. 10Measuring query-document similarityGoal: given query + document string, compute “similarity”
  11. 11. 11Measuring query-document similaritySee “Introduction to Information Retrieval” by Manning et al:http://nlp.stanford.edu/IR-book/> 700papersGoal: given query + document string, compute “similarity”
  12. 12. 12Measuring query-document similarity“philadelphia phillies”In this context “document” can also refer to title tag, meta description, H1, etc.0.74
  13. 13. 13Measuring query-document similarity“philadelphia phillies”Query Modeltokenizationnormalization (stemming)query expansionintentIn this context “document” can also refer to title tag, meta description, H1, etc.0.74
  14. 14. 14Measuring query-document similarity“philadelphia phillies”Query Modeltokenizationnormalization (stemming)query expansionintentDocument Modeltokenizationnormalization (stemming)vector space representationlanguage modelIn this context “document” can also refer to title tag, meta description, H1, etc.0.74
  15. 15. 15Measuring query-document similarity“philadelphia phillies”Query Modeltokenizationnormalization (stemming)query expansionintentDocument Modeltokenizationnormalization (stemming)vector space representationlanguage modelIn this context “document” can also refer to title tag, meta description, H1, etc.Scoring function0.74
  16. 16. 16Query representationLanguage identificationWord segmentation(Japanese, Chinese)Tokenization + normalization{reviews, reviewer, reviewing} -> reviewSpelling correction
  17. 17. 17Query representationLanguage identificationWord segmentation(Japanese, Chinese)Tokenization + normalization{reviews, reviewer, reviewing} -> reviewQuery expansionUser intent (transactional,navigational, informational)LocalClassification(images, video, news)Spelling correction
  18. 18. 18Query representationLanguage identificationWord segmentation(Japanese, Chinese)Tokenization + normalization{reviews, reviewer, reviewing} -> reviewQuery expansionUser intent(transactional, navigational, informational)LocalClassification(images, video, news)Topic Model (LDA)Entity extractionSpelling correction
  19. 19. Document representationTF-IDF
  20. 20. Document representationTF-IDF Language ModelP(optimization | search, engine)>>P(walking | search, engine)
  21. 21. Document representationProbability Ranking PrincipleP(R = 1 | d, q) or P(R = 0 |d, q)TF-IDF Language ModelP(optimization | search, engine)>>P(walking | search, engine)
  22. 22. Which method performs best?What are the characteristics of sites that rank highly?14,000+ keywordsTop 50 results600,000 URLsGoogle-US, no personalizationMarch 2013Mean Spearman CorrelationRemember: “correlation is not causation”
  23. 23. Which method performs best?We tried a few different types of smoothing for the language model,Dirichlet worked best (Zhai and Lafferty SIGIR 2001)
  24. 24. Impact of stemmingPorter stemmer provided a slight increase in correlations
  25. 25. These correlations are still relatively low compared to other factors
  26. 26. 50 results450randompagesmovie reviews
  27. 27. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevant
  28. 28. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevantURL ID PA In SERP?86 92 1355 90 0… … …27 18 0URL ID LanguageModelIn SERP?213 0.97 1156 0.95 1… … …355 0.06 0
  29. 29. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevantURL ID PA In SERP?86 92 1355 90 0… … …27 18 0URL ID LanguageModelIn SERP?213 0.97 1156 0.95 1… … …355 0.06 0P@50 is the “Precision of the top 50 results”. It is the percentage of top 50results by PA/Language Model that are actually in the SERP.Top 50ranked
  30. 30. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevantURL ID PA In SERP?86 92 1355 90 0… … …27 18 0URL ID LanguageModelIn SERP?213 0.97 1156 0.95 1… … …355 0.06 0P@50 is the “Precision of the top 50 results”. It is the percentage of top 50results by PA/Language Model that are actually in the SERP.Top 50ranked
  31. 31. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.
  32. 32. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.Action item: With sophisticated query and document models, noneed to optimize separately for similar words, e.g. “moviereviews” vs “movie review”.
  33. 33. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.Action item: With sophisticated query and document models, noneed to optimize separately for similar words, e.g. “moviereviews” vs “movie review”.Action item: Each page is relevant to many different keywords,so optimize each page for a broad set of related keywords,instead of a single keyword.
  34. 34. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.Action item: With sophisticated query and document models, noneed to optimize separately for similar words, e.g. “moviereviews” vs “movie review”.Action item: Each page is relevant to many different keywords,so optimize each page for a broad set of related keywords,instead of a single keyword.Use case: Content creation. What keywords will this new blogpost target? Is it relevant to a set of queries?
  35. 35. Thanks for watching!Matthew Petersmatt@moz.com @mattthemathman35

×