Your SlideShare is downloading. ×
0
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Peters matthew periodictableseo
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Peters matthew periodictableseo

12,524

Published on

Published in: Technology, Design
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,524
On Slideshare
0
From Embeds
0
Number of Embeds
109
Actions
Shares
0
Downloads
26
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Modern On Page Factors1SMX AdvancedMatthew Peters, PhDmatt@moz.com @mattthemathman
  • 2. 2“philadelphiaphillies”
  • 3. 3“philadelphiaphillies”
  • 4. 4“Relevance” vs “Ranking”Conceptually “relevance” determination and “ranking” can be thought of a twodifferent steps (even if they are implemented as one in a search engine)
  • 5. 5“Relevance” vs “Ranking”Conceptually “relevance” determination and “ranking” can be thought of a twodifferent steps (even if they are implemented as one in a search engine)Relevance
  • 6. 6“Relevance” vs “Ranking”Conceptually “relevance” determination and “ranking” can be thought of a twodifferent steps (even if they are implemented as one in a search engine)RelevanceRanking12
  • 7. 7Is this page relevant to “philadelphia phillies”?
  • 8. 8Is this page relevant to “philadelphia phillies”?query-body similarity: 0.74
  • 9. 9Is this page relevant to “philadelphia phillies”?query-body similarity: 0.74query-title similarity: 0.8query-H1 similarity: 1.0etc …
  • 10. 10Measuring query-document similarityGoal: given query + document string, compute “similarity”
  • 11. 11Measuring query-document similaritySee “Introduction to Information Retrieval” by Manning et al:http://nlp.stanford.edu/IR-book/> 700papersGoal: given query + document string, compute “similarity”
  • 12. 12Measuring query-document similarity“philadelphia phillies”In this context “document” can also refer to title tag, meta description, H1, etc.0.74
  • 13. 13Measuring query-document similarity“philadelphia phillies”Query Modeltokenizationnormalization (stemming)query expansionintentIn this context “document” can also refer to title tag, meta description, H1, etc.0.74
  • 14. 14Measuring query-document similarity“philadelphia phillies”Query Modeltokenizationnormalization (stemming)query expansionintentDocument Modeltokenizationnormalization (stemming)vector space representationlanguage modelIn this context “document” can also refer to title tag, meta description, H1, etc.0.74
  • 15. 15Measuring query-document similarity“philadelphia phillies”Query Modeltokenizationnormalization (stemming)query expansionintentDocument Modeltokenizationnormalization (stemming)vector space representationlanguage modelIn this context “document” can also refer to title tag, meta description, H1, etc.Scoring function0.74
  • 16. 16Query representationLanguage identificationWord segmentation(Japanese, Chinese)Tokenization + normalization{reviews, reviewer, reviewing} -> reviewSpelling correction
  • 17. 17Query representationLanguage identificationWord segmentation(Japanese, Chinese)Tokenization + normalization{reviews, reviewer, reviewing} -> reviewQuery expansionUser intent (transactional,navigational, informational)LocalClassification(images, video, news)Spelling correction
  • 18. 18Query representationLanguage identificationWord segmentation(Japanese, Chinese)Tokenization + normalization{reviews, reviewer, reviewing} -> reviewQuery expansionUser intent(transactional, navigational, informational)LocalClassification(images, video, news)Topic Model (LDA)Entity extractionSpelling correction
  • 19. Document representationTF-IDF
  • 20. Document representationTF-IDF Language ModelP(optimization | search, engine)>>P(walking | search, engine)
  • 21. Document representationProbability Ranking PrincipleP(R = 1 | d, q) or P(R = 0 |d, q)TF-IDF Language ModelP(optimization | search, engine)>>P(walking | search, engine)
  • 22. Which method performs best?What are the characteristics of sites that rank highly?14,000+ keywordsTop 50 results600,000 URLsGoogle-US, no personalizationMarch 2013Mean Spearman CorrelationRemember: “correlation is not causation”
  • 23. Which method performs best?We tried a few different types of smoothing for the language model,Dirichlet worked best (Zhai and Lafferty SIGIR 2001)
  • 24. Impact of stemmingPorter stemmer provided a slight increase in correlations
  • 25. These correlations are still relatively low compared to other factors
  • 26. 50 results450randompagesmovie reviews
  • 27. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevant
  • 28. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevantURL ID PA In SERP?86 92 1355 90 0… … …27 18 0URL ID LanguageModelIn SERP?213 0.97 1156 0.95 1… … …355 0.06 0
  • 29. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevantURL ID PA In SERP?86 92 1355 90 0… … …27 18 0URL ID LanguageModelIn SERP?213 0.97 1156 0.95 1… … …355 0.06 0P@50 is the “Precision of the top 50 results”. It is the percentage of top 50results by PA/Language Model that are actually in the SERP.Top 50ranked
  • 30. 50 results450randompagesmovie reviews For eachquery:500 pages10% relevant90% irrelevantURL ID PA In SERP?86 92 1355 90 0… … …27 18 0URL ID LanguageModelIn SERP?213 0.97 1156 0.95 1… … …355 0.06 0P@50 is the “Precision of the top 50 results”. It is the percentage of top 50results by PA/Language Model that are actually in the SERP.Top 50ranked
  • 31. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.
  • 32. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.Action item: With sophisticated query and document models, noneed to optimize separately for similar words, e.g. “moviereviews” vs “movie review”.
  • 33. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.Action item: With sophisticated query and document models, noneed to optimize separately for similar words, e.g. “moviereviews” vs “movie review”.Action item: Each page is relevant to many different keywords,so optimize each page for a broad set of related keywords,instead of a single keyword.
  • 34. TakeawaysImplication: Query-document similarity is based on decades ofresearch. It’s immune to algorithm change.Action item: With sophisticated query and document models, noneed to optimize separately for similar words, e.g. “moviereviews” vs “movie review”.Action item: Each page is relevant to many different keywords,so optimize each page for a broad set of related keywords,instead of a single keyword.Use case: Content creation. What keywords will this new blogpost target? Is it relevant to a set of queries?
  • 35. Thanks for watching!Matthew Petersmatt@moz.com @mattthemathman35

×