Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keyword Research and Topic Modeling in a Semantic Web

Using Context Vectors and Co-occurance in creating meaningful website content

  • Login to see the comments

Keyword Research and Topic Modeling in a Semantic Web

  1. 1. #pubcon @bill_slawski Keyword Research and Topic Modeling in a Semantic Web Presented by: Bill Slawski Director of SEO Research Go Fish Digital
  2. 2. #pubcon @bill_slawski Leo Carillo Rancho
  3. 3. #pubcon @bill_slawski A historic renovated rancho
  4. 4. #pubcon @bill_slawski Be Careful to Read All The Signs
  5. 5. #pubcon @bill_slawski An Entity Audit Uncovers Surprises Named entities are specific people, places, and things, including products and brands.
  6. 6. #pubcon @bill_slawski Paul Haahr- How Google Works
  7. 7. #pubcon @bill_slawski Schema Markup, Google MyBusiness Verification, Entry in Wikipedia can lead to Knowledge panels, but they are only the start of adding entity information…
  8. 8. #pubcon @bill_slawski An elevator Ride from the DC Metro
  9. 9. #pubcon @bill_slawski There is no clear sign telling people
  10. 10. #pubcon @bill_slawski On the DC Metroline, you connect to: • 91 Stations in Md, Va, & DC • National Zoo • 19 Smithsonian Museums • National Gallery of Art • Capital One Arena • Fedex Field • Pentagon City Shopping Mall
  11. 11. #pubcon @bill_slawski Identify all Missing Entities
  12. 12. #pubcon @bill_slawski Knowing how Google uses context and semantically related phrases can improve the content you create and how well you optimize pages for particular queries.
  13. 13. #pubcon @bill_slawski Keywords & Context Vectors “For example, a horse to a rancher is an animal. A horse to a carpenter is an implement of work. A horse to a gymnast is an implement on which to perform certain exercises. User-context-based search engine
  14. 14. #pubcon @bill_slawski Look to Knowledge Bases
  15. 15. #pubcon @bill_slawski For Other Meanings
  16. 16. #pubcon @bill_slawski See Disambiguation Pages
  17. 17. #pubcon @bill_slawski Context Search Results Context-based filtering of search results
  18. 18. #pubcon @bill_slawski Map Keywords to Pages, then… • Make sure you add words that indicate context • Look up the top pages that rank for those keywords • Find phrases that co-occur for that meaning • See: Improving semantic topic clustering for search Queries with word co-occurrence and biograph co- clustering
  19. 19. #pubcon @bill_slawski Phrase-Based Indexing • Look for co-occurring phrases on pages that rank highly for a query. • Using these related phrases on a page can boost how it ranks for that query (body hits) • Using those related phrases as anchors can boost how the page targeted ranks for that query (anchor hits)
  20. 20. #pubcon @bill_slawski Related Words/Phrases Thematic Modeling Using Related Words in Documents and Anchor Text
  21. 21. #pubcon @bill_slawski Use Complete Phrases • Incomplete Phrase… “President of the…” • Complete Phrase… “President of the United States.”
  22. 22. #pubcon @bill_slawski Use Meaningful Phrases • Some phrases do not add meaning to a page: Pay the Piper Out of the Blue Top of the Morning
  23. 23. #pubcon @bill_slawski Predictive Aspects of Phrases • Semantically, related phrases will be those that are commonly used to discuss or describe a given topic or concept, such as "President of the United States" and "White House." For a given phrase, the related phrases can be ordered according to their relevance or significance based on their respective prediction measures. • Integrated external related phrase information into a phrase-based indexing information retrieval system
  24. 24. #pubcon @bill_slawski Co-occurring Phrases/High Ranking Pages
  25. 25. #pubcon @bill_slawski Clustered Meanings • Jaguars- Cats, Cars, NFL Football Team • Java – Programming Language, Island in Indonesia, Drink • Bank – A place to store money, a river’s side, to lean to a side
  26. 26. #pubcon @bill_slawski Ranking Documents Based on Contained Phrases (Body Hits) “…a ranking stage in which the documents in the search results are ranked, using the phrase information in each document's related phrase bit vector, and the cluster bit vector for the query phrases. This approach ranks documents according to the phrases that are contained in the document, or informally ‘body hits.’” Integrated external related phrase information into a phrase-based indexing information retrieval system
  27. 27. #pubcon @bill_slawski Anchor Hits ”Sorting the documents on the outbound score component makes documents that have many related phrases to the query as ‘anchor hits,’ rank most highly, thus representing these documents as ‘expert’ documents” •Integrated external related phrase information into a phrase-based indexing information retrieval system
  28. 28. #pubcon @bill_slawski Personalization & Query Classifications • Depending upon results selected by a searcher, the results they see may fall into a specific category from a biased document set Personalizing Search Results at Google
  29. 29. #pubcon @bill_slawski Which Lincoln?
  30. 30. #pubcon @bill_slawski Look at Knowledge Bases • Abraham Lincoln
  31. 31. #pubcon @bill_slawski Look at Top Search Results • Lincoln, Towncar
  32. 32. #pubcon @bill_slawski Look at Other Search Entities • Lincoln, Nebraska
  33. 33. #pubcon @bill_slawski Query Classifications Search for “Lincoln” and click on the Person (Abe), the Place (Nebraska), or the thing (towncar). What you click on may determine what you see in the future on searches for “Lincoln.” …determining whether to assign the classification to the first query based upon classifications for the identified search entities. •Propagating query classifications
  34. 34. #pubcon @bill_slawski Searches are what we type, and what we say, but they will also be based upon what we see and take photos of in the future.
  35. 35. #pubcon @bill_slawski Google Lens Schema Smart Camera User Interface
  36. 36. #pubcon @bill_slawski Further Reading • Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources • A Review of Relational Machine Learning for Knowledge Graphs • Knowledge Curation and Knowledge Fusion: Challenges, Models, and Applications • Improving semantic topic clustering for search queries with word co-occurrence and bigraph co-clustering
  37. 37. #pubcon @bill_slawski Questions? Ask Me At: • Twitter: • LinkedIn: • Facebook: • Google+: • SEO by the Sea: • Go Fish Digital Blog: