Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semtech bizsemanticsearchtutorial

4,915 views

Published on

Slides for Semantic Search Tuorial at #SemTechBiz 2014 by Barbara Starr and Bill Slawski

Published in: Business

Semtech bizsemanticsearchtutorial

  1. 1. • Barbara Starr ( ) – Basics of What semantic search is, what tools and techniques are used • Bill Slawski ( ) – Strategy for SEO – Case based examples and analysis
  2. 2. • Pursued a doctorate in Artificial Intelligence from South Africa in the 80's. • Recruited to build intelligent/predictive trading systems on Wall Street • Migrated to government-based contracts, several of which turned into real world products like – SIRI (PAL from DARPA) – WATSON (Acquaint - IBM Watson Labs was a team member) • From the vantage of a semantic technologist, I keenly watched the evolution of the Semantic Web. • “Shocked into the real world” when working as a consultant @ Overstock. – Rdfa on 900,000 item pages 2 days before Google adopted it – UPC and identifier “miner” • Today – Consultant for companies such as GS1 US, Columnist, Strategist, …
  3. 3. • Primitive UI – Hunt and Peck
  4. 4. Primarily Stochastic in nature
  5. 5. • Based on concept of “citations” and very easily gamed • Probabilistic or Statistical (Not Symbolic) • Keyword Based Search Engine (Not Concept Based or Ontology Based) • “link juice” ? • Other odd vernacular that became standard jargon in the “SEO” community
  6. 6. SIRI “Amazing fact: same amount of computing to answer one Google Search query as all the computing done – in flight and on the ground -- for the entire Apollo program!” “Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years”” Source: Wikipedia
  7. 7. “A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities” • Tim Berners Lee • James Hendler • Ora Lassila http://www.cs.umd.edu/~golbeck/LBSC690/SemanticWeb.html
  8. 8. What they want When they want it (Now) Accurate (Reliable & Informative) Available Search engines must satisfy consumer needs, else:
  9. 9. “Def. Semantic Search is any retrieval method where – User intent and resources are represented in a semantic model • A set of concepts or topics that generalize over tokens/phrases • Additional structure such as a hierarchy among concepts, relationships among concepts etc. – Semantic representations of the query and the user intent are exploited in some part of the retrieval process” Peter Mika, Sr. Research Scientist, Yahoo Labs ⎪ June 19, 2014
  10. 10. Inevitable passage of Semantic Web adoption (or some version thereof) – culminating in schema.org http://semanticweb.com/semtech-2011-coverage-the-rdfaseo-wave-how-to-catch-it-and-why_b20458
  11. 11. “Things” not” strings” -May 16 2012 Understanding “things” helps Google understand what things are in the world and what users are searching for June 2012 –Twitter announces Twitter Cards Pinterest Rich Pins
  12. 12. • Directly extracting on page metadata to create enhanced displays • Searching directly on consumed metadata • Provide direct answers to queries by searching on consumed, verified and validated information RICH SNIPPETS 2009 Searchmonkey 2008 • Aggregate answers or deduce them (like a timeline of events) • Expose more relevant answers in the long tail of search • Assist in interpreting a user query • Detect relevancy signals: i.e what content to show to what audience • Use it in conjunction with machine learning techniques- to eg. Train other components • … tiles Long tail: Peanut Butter and Jelly in stripes ?
  13. 13. Search is changing • Semantic, Predictive, Personalised, Conversational – Search over documents – Search over Data • Rise of Answer Engines (Direct answers proliferating) • Data Quality is imperative Becoming Less like a search Engine and more like a personal Assistant
  14. 14. SIRI Google Now Cortana AiAgents (create your own) Runs cross platform
  15. 15. “Answer box” Organic Search Results Search Over Data Knowledge Panel Search Over Documents
  16. 16. Synonymous with the migration to “Answer Engines “ & “Search Over Data”
  17. 17. Crawling & Indexing Query Interpretation Indexing and Ranking Results Presentation Indexed information
  18. 18. Means of preprocessing documents to speed up search (serving results in real time)
  19. 19. • Microsoft has given a fairly concise definition of the entity recognition and disambiguation process: – The objective of an Entity Recognition and Disambiguation system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection or knowledge base. • In Google’s case, that means recognizing entities on web pages or web documents and mapping them back to specific entities in their Knowledge Graph
  20. 20. Implicit entity graph derived/inferred from the text on a web page Explicit entities obtained from structured markup on a web page May need to map to external Ontologies like schema.org or some other ontology Technology – NLP or IR or … Technology – Semantic Web
  21. 21. Make it Search Engine/Machine Friendly & tell them (explicitly) what “things” are on your web page • Make it (your information on your website) available to Google (and the major search and social engines), ensure you make it easy for computers to read and discover your stuff. • With schema.org (and/or the preferred vocabulary/ontology of the search social engine you are optimizing for, e.g for Facebook use rdfa & Opengraph). Google, Yahoo, Bing, Yandex => Schema.org • Pick a markup format (syntax) and stick with it – Microdata – Microformat – Rdfa – Rdfa lite – JSON-LD
  22. 22. • Recall some of Google’s Mission/Objective Statements or goals – “Organizing the worlds information to make it universally accessible and useful” – “To help with that we have built the knowledge graph” – Give an identity to every “thing” in the world • The knowledge graph – Contains information and entities and their relationships – Helps in Resolving ambiguities when processing queries You can explicitly disambiguate your content by providing a freebase mid – machine identifier - (in your markup)
  23. 23. Ref: Google I/O 2013
  24. 24. Google plus in “Enhanced Displays and the knowledge Graph • Authorship • Local businesses • Knowledge Carousel • ………
  25. 25. With Schema.org (and JSON-LD in this case) • Note the sameAs statement • mid makes it easier to match or reconcile the “thing” https://www.youtube.com/watch?v=W9pRpSW_KqA&src_vid=0oOwrBEeQss&feature=iv&annotation_id=annotation_1139520055 Ref: Google I/O 2014
  26. 26. The Knowledge Graph Powers: • Rich snippets in Events • Event listings in Google Maps • Notifications in Google Now https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014
  27. 27. https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014
  28. 28. http://youtu.be/pkrxhefQIBs
  29. 29. Rich snippets make your data more visible in Search Engine Results Pages Which would you rather click on? No Rich Snippets With Rich Snippets Lower Bounce Rate
  30. 30. 32 More Visibility in verticals, recipes & images via markup In Search Engine Results Pages Your product is not visible if no “color” attribute is populated & Search Verticals
  31. 31. You want peanut butter and jelly in stripes ? Allows unique and interesting content to surface
  32. 32. “Google Plus” Key Point - Corollary: If you don’t exist as an entity you do not exist in the knowledge graph or in “Search Over Data” The cost of that: Anonymity and Irrelevance!
  33. 33. http://www.socialmediaexaminer.com/rich-pins-on-pinterest/ Twitter Cards & Deep Linking Pinterest Pins Facebook Opengraph • Drive Brand awareness • Diversify Revenue Sources (Reduce Dependence on Google) • Increase Lift & Conversions
  34. 34. Google’s Structured Markup Helper • Generates JSON-LD or microdata • E-mail and web page markup Data Highlighter https://support.google.com/webmasters/answer/99170?hl=en&ref_topic=1088472 “Google can present your data more attractively -- and in new ways -- in search results and in other products such as the Google Knowledge Graph.” List provided on schema.rdfs.org Wordpress plugin and html code http://schema.rdfs.org/tools.html
  35. 35. Make sure to enable Microdata
  36. 36. • Microdata reveal · JSON-LD sniffer · Semantic inspector · META SEO inspector · Green Turtle RDFa List maintained by Aaron Bradley: http://www.seoskeptic.com/structured-data-markup-validation-testing-tools/ Written Explanation of Walkthrough http://searchengineland.com/see-entities-web-page-tools-help-194710 GRUFF
  37. 37. • Alchemyapi (with freebase mappings of entities since July 2013) • Opencalais • Semantic Verses • Aylien which was launched in Feb 2014, provides mappings to freebase and schema.org. • Smartlogic • lexalytics • Text-Processing • Stanford’s Ner • Textrazor
  38. 38. The following information MUST MATCH!
  39. 39. Ensure sure you supply rich, high quality data, mapped to search filters for maximum visibility Not visible if no “color” attribute populated Fill in The Gaps
  40. 40. • Ensure to supply rich, consistent data in any format you submit and ensure it is validated, verified and fresh • Send Consistent signals • Provide global identifiers whenever possible
  41. 41. Rich Product information with GTIN
  42. 42. • Implicit (content and Bill) also tools I have
  43. 43. • “Query logs record the actual usage of search systems and their analysis has proven critical to improving search engine functionality. Yet, despite the deluge of information, query log analysis often suffers from the sparsity of the query space. we propose a new model for query log data called the entity-aware click graph. In this representation, we decompose queries into entities and modifiers, and measure their association with clicked pages. We demonstrate the benefits of this approach on the crucial task of understanding which websites fulfill similar user needs, showing that using this representation we can achieve a higher precision than other query log-based approaches ” Measuring website similarity using an entity-aware click graph 2012 publication: Peter Mika, Hugo Zaragoza, Pablo N Mendes, RoI Blanco http://dl.acm.org/citation.cfm?id=2398500
  44. 44. Need to understand the question in order to answer it • Entity Mention Queries: Common structure to entity mention queries: query = <entity> + <intent> • Queries that return facts as an answer • What form does the question take? (Question forms) Where was X born? When was X born? Who invented X? Where was X invented? What is the X of Y? Flights from ?x to ?y Visit old problems/solutions with scale (Parameterized Queries, Form Based Queries, Query Template, Template Based Query) Takeaway: Create Content that will provide great answers to these kinds of questions (for entities relevant to your audience)
  45. 45. • Social Graphs • Interest Graphs • Mobile Social graphs • Attraction graphs • Engagement graphs • Attention Graphs • Intent graph • User Query Graph • ……..
  46. 46. Takeaway: Write engaging content around your audiences interests (Find ways – “Big Data” - to determine their interests)
  47. 47. Anatomy of a Google Search Results Page (Revisited) Search Over Data Search Over Documents
  48. 48. • Slide:3 https://www.flickr.com/photos/67262490@N04/6151466225/ • Slide 5 https://www.flickr.com/photos/outsourcetechndu/8241430872/ • Slide 9: https://www.flickr.com/photos/drs2biz/197524395/ • Slide 3: https://www.flickr.com/photos/106426559@N03/10448641806/ • Slide 3: https://www.flickr.com/photos/amynkassam/2866419139/ • Slide 5 https://www.flickr.com/photos/legocy/8291983493/in/photolist • slide 4: https://www.flickr.com/photos/mekz/2389113709/in/photolist

×