• Barbara Starr ( ) 
– Basics of What semantic search is, what tools 
and techniques are used 
• Bill Slawski ( ) 
– Strategy for SEO 
– Case based examples and analysis
• Pursued a doctorate in Artificial Intelligence from 
South Africa in the 80's. 
• Recruited to build intelligent/predictive trading 
systems on Wall Street 
• Migrated to government-based contracts, several 
of which turned into real world products like 
– SIRI (PAL from DARPA) 
– WATSON (Acquaint - IBM Watson Labs was 
a team member) 
• From the vantage of a semantic technologist, I 
keenly watched the evolution of the Semantic Web. 
• “Shocked into the real world” when working as a 
consultant @ Overstock. 
– Rdfa on 900,000 item pages 2 days before Google adopted it 
– UPC and identifier “miner” 
• Today – Consultant for companies such as GS1 
US, Columnist, Strategist, …
• Primitive UI – Hunt and Peck
Primarily Stochastic in nature
• Based on concept of “citations” and very easily gamed 
• Probabilistic or Statistical (Not Symbolic) 
• Keyword Based Search Engine (Not Concept Based or 
Ontology Based) 
• “link juice” ? 
• Other odd vernacular that 
became standard jargon in the 
“SEO” community
SIRI 
“Amazing fact: same amount 
of computing to answer one 
Google Search query as all the 
computing done – 
in flight and on the ground 
-- for the entire Apollo program!” 
“Moore's law is the observation 
that, over the history of 
computing hardware, the 
number of transistors in a 
dense integrated circuit doubles 
approximately every two years”” 
Source: Wikipedia
“A new form of Web 
content that is meaningful 
to computers will unleash a 
revolution of new 
possibilities” 
• Tim Berners Lee 
• James Hendler 
• Ora Lassila 
http://www.cs.umd.edu/~golbeck/LBSC690/SemanticWeb.html
What they want 
When they want it (Now) 
Accurate (Reliable & Informative) 
Available 
Search engines must satisfy consumer needs, else:
“Def. Semantic Search is any retrieval method where 
– User intent and resources are represented in a semantic model 
• A set of concepts or topics that generalize over tokens/phrases 
• Additional structure such as a hierarchy among concepts, relationships among 
concepts etc. 
– Semantic representations of the query and the user intent are exploited 
in some part of the retrieval process” 
Peter Mika, Sr. Research Scientist, Yahoo Labs ⎪ June 19, 2014
Inevitable passage of 
Semantic Web adoption 
(or some version thereof) 
– culminating in 
schema.org 
http://semanticweb.com/semtech-2011-coverage-the-rdfaseo-wave-how-to-catch-it-and-why_b20458
“Things” not” strings” -May 16 2012 
Understanding “things” helps Google 
understand what things are in the world 
and what users are searching for 
June 2012 –Twitter announces Twitter Cards Pinterest 
Rich Pins
• Directly extracting on page metadata to create enhanced displays 
• Searching directly on consumed metadata 
• Provide direct answers to queries by searching on consumed, verified and validated 
information 
RICH SNIPPETS 2009 
Searchmonkey 2008 
• Aggregate answers or deduce them (like a timeline of events) 
• Expose more relevant answers in the long tail of search 
• Assist in interpreting a user query 
• Detect relevancy signals: i.e what content to show to what audience 
• Use it in conjunction with machine learning techniques- to eg. Train other components 
• … 
tiles 
Long tail: 
Peanut Butter 
and Jelly in 
stripes ?
Search is changing 
• Semantic, Predictive, Personalised, Conversational 
– Search over documents 
– Search over Data 
• Rise of Answer Engines (Direct answers proliferating) 
• Data Quality is imperative 
Becoming Less like a search Engine 
and more like a personal Assistant
SIRI 
Google Now 
Cortana 
AiAgents 
(create your own) 
Runs cross platform
“Answer 
box” 
Organic 
Search 
Results 
Search 
Over Data 
Knowledge 
Panel 
Search 
Over 
Documents
Synonymous with the migration to “Answer Engines “ & “Search Over Data”
Crawling & 
Indexing 
Query 
Interpretation 
Indexing and 
Ranking 
Results 
Presentation 
Indexed 
information
Means of preprocessing documents to speed 
up search (serving results in real time)
• Microsoft has given a fairly concise definition of the entity 
recognition and disambiguation process: 
– The objective of an Entity Recognition and Disambiguation 
system is to recognize mentions of entities in a given text, 
disambiguate them, and map them to the entities in a given 
entity collection or knowledge base. 
• In Google’s case, that means recognizing entities on web 
pages or web documents and mapping them back to 
specific entities in their Knowledge Graph
Implicit entity graph derived/inferred 
from the text on a web page 
Explicit entities obtained from 
structured markup on a web page 
May need to map to 
external Ontologies like 
schema.org or some 
other ontology 
Technology – NLP or IR or … Technology – Semantic Web
Make it Search Engine/Machine Friendly & tell them (explicitly) 
what “things” are on your web page 
• Make it (your information on your website) available to Google (and the major search and social 
engines), ensure you make it easy for computers to read and discover your stuff. 
• With schema.org (and/or the preferred vocabulary/ontology of the search social engine you are 
optimizing for, e.g for Facebook use rdfa & Opengraph). Google, Yahoo, Bing, Yandex => 
Schema.org 
• Pick a markup format (syntax) and stick with it 
– Microdata 
– Microformat 
– Rdfa 
– Rdfa lite 
– JSON-LD
• Recall some of Google’s Mission/Objective Statements or goals 
– “Organizing the worlds information to make it universally accessible and useful” 
– “To help with that we have built the knowledge graph” 
– Give an identity to every “thing” in the world 
• The knowledge graph 
– Contains information and entities and their relationships 
– Helps in Resolving ambiguities when processing queries 
You can explicitly disambiguate your content by providing a freebase mid – 
machine identifier - (in your markup)
Ref: Google I/O 2013
Google plus in “Enhanced Displays and 
the knowledge Graph 
• Authorship 
• Local businesses 
• Knowledge Carousel 
• ………
With Schema.org (and JSON-LD in this case) 
• Note the sameAs statement 
• mid makes it easier to match or reconcile the “thing” 
https://www.youtube.com/watch?v=W9pRpSW_KqA&src_vid=0oOwrBEeQss&feature=iv&annotation_id=annotation_1139520055 Ref: Google I/O 2014
The Knowledge Graph Powers: 
• Rich snippets in Events 
• Event listings in Google Maps 
• Notifications in Google Now 
https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014
https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014
http://youtu.be/pkrxhefQIBs
Rich snippets make your data more visible in Search Engine Results Pages 
Which would you rather click on? 
No Rich Snippets With Rich Snippets 
Lower Bounce Rate
32 
More Visibility in 
verticals, recipes 
& images via 
markup 
In Search Engine Results Pages 
Your product is not visible 
if no “color” attribute is 
populated 
& 
Search Verticals
You want peanut 
butter and jelly in 
stripes ? 
Allows unique and interesting content to surface
“Google 
Plus” 
Key Point - 
Corollary: If you don’t exist as an entity you do not exist in the knowledge graph or in “Search Over Data” 
The cost of that: Anonymity and Irrelevance!
http://www.socialmediaexaminer.com/rich-pins-on-pinterest/ 
Twitter Cards & Deep Linking 
Pinterest Pins 
Facebook 
Opengraph 
• Drive Brand awareness 
• Diversify Revenue Sources 
(Reduce Dependence on 
Google) 
• Increase Lift & Conversions
Google’s Structured Markup Helper 
• Generates JSON-LD or microdata 
• E-mail and web page markup 
Data Highlighter 
https://support.google.com/webmasters/answer/99170?hl=en&ref_topic=1088472 
“Google can present your data more attractively 
-- and in new ways -- in search results and in other 
products such as the Google Knowledge Graph.” 
List provided on schema.rdfs.org 
Wordpress plugin and html code http://schema.rdfs.org/tools.html
Make sure 
to enable 
Microdata
• Microdata reveal 
· JSON-LD sniffer 
· Semantic inspector 
· META SEO inspector 
· Green Turtle RDFa 
List maintained by Aaron Bradley: 
http://www.seoskeptic.com/structured-data-markup-validation-testing-tools/ 
Written Explanation of Walkthrough 
http://searchengineland.com/see-entities-web-page-tools-help-194710 
GRUFF
• Alchemyapi (with freebase mappings of entities since July 2013) 
• Opencalais 
• Semantic Verses 
• Aylien which was launched in Feb 2014, provides mappings to freebase and schema.org. 
• Smartlogic 
• lexalytics 
• Text-Processing 
• Stanford’s Ner 
• Textrazor
The following information 
MUST MATCH!
Ensure sure you supply rich, high quality data, 
mapped to search filters for maximum visibility 
Not visible if no “color” 
attribute populated 
Fill in The 
Gaps
• Ensure to supply rich, consistent data in any 
format you submit and ensure it is validated, 
verified and fresh 
• Send Consistent signals 
• Provide global identifiers whenever possible
Rich 
Product 
information 
with GTIN
• Implicit (content and Bill) also tools I have
• “Query logs record the actual usage of search systems and their analysis has proven critical to 
improving search engine functionality. Yet, despite the deluge of information, query log analysis 
often suffers from the sparsity of the query space. 
we propose a new model for query log data called the entity-aware 
click graph. In this representation, we decompose queries into entities and modifiers, and 
measure their association with clicked pages. We demonstrate the benefits of this approach on 
the crucial task of understanding which websites fulfill similar user needs, showing that using this 
representation we can achieve a higher precision than other query log-based approaches ” 
Measuring website similarity using an entity-aware click graph 
2012 publication: Peter Mika, Hugo Zaragoza, Pablo N Mendes, RoI Blanco 
http://dl.acm.org/citation.cfm?id=2398500
Need to understand the question in order to answer it 
• Entity Mention Queries: Common structure to entity mention queries: 
query = <entity> + <intent> 
• Queries that return facts as an answer 
• What form does the question take? (Question forms) 
Where was X born? 
When was X born? 
Who invented X? 
Where was X invented? 
What is the X of Y? 
Flights from ?x to ?y 
Visit old problems/solutions with scale (Parameterized Queries, Form Based Queries, 
Query Template, Template Based Query) 
Takeaway: Create Content that will provide great answers to these kinds of questions 
(for entities relevant to your audience)
• Social Graphs 
• Interest Graphs 
• Mobile Social graphs 
• Attraction graphs 
• Engagement graphs 
• Attention Graphs 
• Intent graph 
• User Query Graph 
• ……..
Takeaway: Write engaging content around your audiences interests 
(Find ways – “Big Data” - to determine their interests)
Anatomy of a Google Search 
Results Page (Revisited) 
Search 
Over Data 
Search 
Over 
Documents
• Slide:3 https://www.flickr.com/photos/67262490@N04/6151466225/ 
• Slide 5 https://www.flickr.com/photos/outsourcetechndu/8241430872/ 
• Slide 9: https://www.flickr.com/photos/drs2biz/197524395/ 
• Slide 3: https://www.flickr.com/photos/106426559@N03/10448641806/ 
• Slide 3: https://www.flickr.com/photos/amynkassam/2866419139/ 
• Slide 5 https://www.flickr.com/photos/legocy/8291983493/in/photolist 
• slide 4: https://www.flickr.com/photos/mekz/2389113709/in/photolist

Semtech bizsemanticsearchtutorial

  • 1.
    • Barbara Starr( ) – Basics of What semantic search is, what tools and techniques are used • Bill Slawski ( ) – Strategy for SEO – Case based examples and analysis
  • 2.
    • Pursued adoctorate in Artificial Intelligence from South Africa in the 80's. • Recruited to build intelligent/predictive trading systems on Wall Street • Migrated to government-based contracts, several of which turned into real world products like – SIRI (PAL from DARPA) – WATSON (Acquaint - IBM Watson Labs was a team member) • From the vantage of a semantic technologist, I keenly watched the evolution of the Semantic Web. • “Shocked into the real world” when working as a consultant @ Overstock. – Rdfa on 900,000 item pages 2 days before Google adopted it – UPC and identifier “miner” • Today – Consultant for companies such as GS1 US, Columnist, Strategist, …
  • 3.
    • Primitive UI– Hunt and Peck
  • 4.
  • 5.
    • Based onconcept of “citations” and very easily gamed • Probabilistic or Statistical (Not Symbolic) • Keyword Based Search Engine (Not Concept Based or Ontology Based) • “link juice” ? • Other odd vernacular that became standard jargon in the “SEO” community
  • 6.
    SIRI “Amazing fact:same amount of computing to answer one Google Search query as all the computing done – in flight and on the ground -- for the entire Apollo program!” “Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years”” Source: Wikipedia
  • 7.
    “A new formof Web content that is meaningful to computers will unleash a revolution of new possibilities” • Tim Berners Lee • James Hendler • Ora Lassila http://www.cs.umd.edu/~golbeck/LBSC690/SemanticWeb.html
  • 8.
    What they want When they want it (Now) Accurate (Reliable & Informative) Available Search engines must satisfy consumer needs, else:
  • 10.
    “Def. Semantic Searchis any retrieval method where – User intent and resources are represented in a semantic model • A set of concepts or topics that generalize over tokens/phrases • Additional structure such as a hierarchy among concepts, relationships among concepts etc. – Semantic representations of the query and the user intent are exploited in some part of the retrieval process” Peter Mika, Sr. Research Scientist, Yahoo Labs ⎪ June 19, 2014
  • 11.
    Inevitable passage of Semantic Web adoption (or some version thereof) – culminating in schema.org http://semanticweb.com/semtech-2011-coverage-the-rdfaseo-wave-how-to-catch-it-and-why_b20458
  • 12.
    “Things” not” strings”-May 16 2012 Understanding “things” helps Google understand what things are in the world and what users are searching for June 2012 –Twitter announces Twitter Cards Pinterest Rich Pins
  • 13.
    • Directly extractingon page metadata to create enhanced displays • Searching directly on consumed metadata • Provide direct answers to queries by searching on consumed, verified and validated information RICH SNIPPETS 2009 Searchmonkey 2008 • Aggregate answers or deduce them (like a timeline of events) • Expose more relevant answers in the long tail of search • Assist in interpreting a user query • Detect relevancy signals: i.e what content to show to what audience • Use it in conjunction with machine learning techniques- to eg. Train other components • … tiles Long tail: Peanut Butter and Jelly in stripes ?
  • 14.
    Search is changing • Semantic, Predictive, Personalised, Conversational – Search over documents – Search over Data • Rise of Answer Engines (Direct answers proliferating) • Data Quality is imperative Becoming Less like a search Engine and more like a personal Assistant
  • 15.
    SIRI Google Now Cortana AiAgents (create your own) Runs cross platform
  • 16.
    “Answer box” Organic Search Results Search Over Data Knowledge Panel Search Over Documents
  • 17.
    Synonymous with themigration to “Answer Engines “ & “Search Over Data”
  • 18.
    Crawling & Indexing Query Interpretation Indexing and Ranking Results Presentation Indexed information
  • 19.
    Means of preprocessingdocuments to speed up search (serving results in real time)
  • 20.
    • Microsoft hasgiven a fairly concise definition of the entity recognition and disambiguation process: – The objective of an Entity Recognition and Disambiguation system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection or knowledge base. • In Google’s case, that means recognizing entities on web pages or web documents and mapping them back to specific entities in their Knowledge Graph
  • 21.
    Implicit entity graphderived/inferred from the text on a web page Explicit entities obtained from structured markup on a web page May need to map to external Ontologies like schema.org or some other ontology Technology – NLP or IR or … Technology – Semantic Web
  • 22.
    Make it SearchEngine/Machine Friendly & tell them (explicitly) what “things” are on your web page • Make it (your information on your website) available to Google (and the major search and social engines), ensure you make it easy for computers to read and discover your stuff. • With schema.org (and/or the preferred vocabulary/ontology of the search social engine you are optimizing for, e.g for Facebook use rdfa & Opengraph). Google, Yahoo, Bing, Yandex => Schema.org • Pick a markup format (syntax) and stick with it – Microdata – Microformat – Rdfa – Rdfa lite – JSON-LD
  • 23.
    • Recall someof Google’s Mission/Objective Statements or goals – “Organizing the worlds information to make it universally accessible and useful” – “To help with that we have built the knowledge graph” – Give an identity to every “thing” in the world • The knowledge graph – Contains information and entities and their relationships – Helps in Resolving ambiguities when processing queries You can explicitly disambiguate your content by providing a freebase mid – machine identifier - (in your markup)
  • 24.
  • 25.
    Google plus in“Enhanced Displays and the knowledge Graph • Authorship • Local businesses • Knowledge Carousel • ………
  • 26.
    With Schema.org (andJSON-LD in this case) • Note the sameAs statement • mid makes it easier to match or reconcile the “thing” https://www.youtube.com/watch?v=W9pRpSW_KqA&src_vid=0oOwrBEeQss&feature=iv&annotation_id=annotation_1139520055 Ref: Google I/O 2014
  • 27.
    The Knowledge GraphPowers: • Rich snippets in Events • Event listings in Google Maps • Notifications in Google Now https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014
  • 28.
  • 29.
  • 31.
    Rich snippets makeyour data more visible in Search Engine Results Pages Which would you rather click on? No Rich Snippets With Rich Snippets Lower Bounce Rate
  • 32.
    32 More Visibilityin verticals, recipes & images via markup In Search Engine Results Pages Your product is not visible if no “color” attribute is populated & Search Verticals
  • 33.
    You want peanut butter and jelly in stripes ? Allows unique and interesting content to surface
  • 34.
    “Google Plus” KeyPoint - Corollary: If you don’t exist as an entity you do not exist in the knowledge graph or in “Search Over Data” The cost of that: Anonymity and Irrelevance!
  • 35.
    http://www.socialmediaexaminer.com/rich-pins-on-pinterest/ Twitter Cards& Deep Linking Pinterest Pins Facebook Opengraph • Drive Brand awareness • Diversify Revenue Sources (Reduce Dependence on Google) • Increase Lift & Conversions
  • 37.
    Google’s Structured MarkupHelper • Generates JSON-LD or microdata • E-mail and web page markup Data Highlighter https://support.google.com/webmasters/answer/99170?hl=en&ref_topic=1088472 “Google can present your data more attractively -- and in new ways -- in search results and in other products such as the Google Knowledge Graph.” List provided on schema.rdfs.org Wordpress plugin and html code http://schema.rdfs.org/tools.html
  • 44.
    Make sure toenable Microdata
  • 47.
    • Microdata reveal · JSON-LD sniffer · Semantic inspector · META SEO inspector · Green Turtle RDFa List maintained by Aaron Bradley: http://www.seoskeptic.com/structured-data-markup-validation-testing-tools/ Written Explanation of Walkthrough http://searchengineland.com/see-entities-web-page-tools-help-194710 GRUFF
  • 49.
    • Alchemyapi (withfreebase mappings of entities since July 2013) • Opencalais • Semantic Verses • Aylien which was launched in Feb 2014, provides mappings to freebase and schema.org. • Smartlogic • lexalytics • Text-Processing • Stanford’s Ner • Textrazor
  • 50.
  • 51.
    Ensure sure yousupply rich, high quality data, mapped to search filters for maximum visibility Not visible if no “color” attribute populated Fill in The Gaps
  • 52.
    • Ensure tosupply rich, consistent data in any format you submit and ensure it is validated, verified and fresh • Send Consistent signals • Provide global identifiers whenever possible
  • 53.
  • 55.
    • Implicit (contentand Bill) also tools I have
  • 57.
    • “Query logsrecord the actual usage of search systems and their analysis has proven critical to improving search engine functionality. Yet, despite the deluge of information, query log analysis often suffers from the sparsity of the query space. we propose a new model for query log data called the entity-aware click graph. In this representation, we decompose queries into entities and modifiers, and measure their association with clicked pages. We demonstrate the benefits of this approach on the crucial task of understanding which websites fulfill similar user needs, showing that using this representation we can achieve a higher precision than other query log-based approaches ” Measuring website similarity using an entity-aware click graph 2012 publication: Peter Mika, Hugo Zaragoza, Pablo N Mendes, RoI Blanco http://dl.acm.org/citation.cfm?id=2398500
  • 58.
    Need to understandthe question in order to answer it • Entity Mention Queries: Common structure to entity mention queries: query = <entity> + <intent> • Queries that return facts as an answer • What form does the question take? (Question forms) Where was X born? When was X born? Who invented X? Where was X invented? What is the X of Y? Flights from ?x to ?y Visit old problems/solutions with scale (Parameterized Queries, Form Based Queries, Query Template, Template Based Query) Takeaway: Create Content that will provide great answers to these kinds of questions (for entities relevant to your audience)
  • 60.
    • Social Graphs • Interest Graphs • Mobile Social graphs • Attraction graphs • Engagement graphs • Attention Graphs • Intent graph • User Query Graph • ……..
  • 61.
    Takeaway: Write engagingcontent around your audiences interests (Find ways – “Big Data” - to determine their interests)
  • 62.
    Anatomy of aGoogle Search Results Page (Revisited) Search Over Data Search Over Documents
  • 64.
    • Slide:3 https://www.flickr.com/photos/67262490@N04/6151466225/ • Slide 5 https://www.flickr.com/photos/outsourcetechndu/8241430872/ • Slide 9: https://www.flickr.com/photos/drs2biz/197524395/ • Slide 3: https://www.flickr.com/photos/106426559@N03/10448641806/ • Slide 3: https://www.flickr.com/photos/amynkassam/2866419139/ • Slide 5 https://www.flickr.com/photos/legocy/8291983493/in/photolist • slide 4: https://www.flickr.com/photos/mekz/2389113709/in/photolist