Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities

@KorayGubur
Semantic Search Engine &
Query Parsing
In the Light of Semantic Search Principles

@KorayGubur
A b o u t M e
Koray Tuğberk GÜBÜR
Owner and Founder of Holistic SEO & Digital
• Educates his team
• Publishes SEO Case Studies, Researches & Guides
• Twitter: @KorayGubur
• Email: ktgubur@holisticseo.digital
• Official Site: https://www.holisticseo.digital

@KorayGubur
S E O C a s e S t u d i e s o f K T G

@KorayGubur
S E O G u i d e s o f K T G

@KorayGubur
W e b i n a r s a n d I n t e r v i e w s o f K T G

@KorayGubur
What is Query Parsing?
• Query Parsing it the process of
understanding the different sections of a
query.
• Types: Entity-seeking Query, a Substitue
Term, or Synonym Term.
• Canonical and Represented Versions: A
Canonical Query can represent close
variations.
• Query Character: Affects the SERP Design,
Dominant and Minor Search Intent
Assigments.
• Query Process: Other name of the Query
Parsing.
@KorayGubur

@KorayGubur
Multi-Stage Query Processing
• The first patent that talks about «Context
of Words».
• It tries to delete the stop words.
• Stemming the concrete words.
• Expanding words with Synonyms and Co-
occurence.
• Some Criterias: Absent Queries, Boolean
Logic, Query Term Weights, Document
Popularity, Word Proximity (Distance),
Word Adjacency.
• It uses «VIPS» and Web Page Layout.
@KorayGubur
Inventors: Jeffrey Adgate Dean, Paul G.
Haahr, Olcan Sercinoglu, and Amitabh
K. Singhal
US Patent Application 20060036593
Filed: August 13, 2004
Published February 16, 2006

@KorayGubur
Query Breadth
• This is for «adjecent words» and
«unknown entities».
• It uses related document count to see
the ‘query breadth’.
• Query Breadth can be decreased with
the ‘adjecent word’ count.
• Query Breadth can be used for ‘Named
Entity Recognition’, or Triple Creation
(An Object and two Subject).
Invented by Karl Pfleger and Brian Larson
Assigned to Google
US Patent 7,925,657
Granted April 12, 2011
Filed: March 17, 2004
@KorayGubur

@KorayGubur
Query Analysis
• Selection Over Time: For different timespans,
a document can be chosen more frequently.
• Documents with Hot Topics: Rising Queries
can boost documents that include these
queries.
• Documents with Related Hot Topics: Related
queries for rising queries can boost the
documents with related queries.
• Constant Queries with Consistently Changing
Results: Constant Query is the always popular
query with changing information for a topic.
• Freshness of Documents: Date of the
information on the web page, not the date of
the document’s last version.
@KorayGubur
Invented by Karl Pfleger and Brian Larson
Assigned to Google
US Patent 7,925,657
Granted April 12, 2011

@KorayGubur
Query Analysis
• Staleness of Documents: Historical Data
amount can be a positive ranking signal
for a page for a query.
• Overly Broad Pages: Includes discordant
queries, a signal for spam.
• Continuation Patent filed in 2011 for
«document locator». And, some terms
changed.
@KorayGubur
Inventors: DEAN; Jeffrey; (Palo Alto,
CA) ; Haahr; Paul; (San Francisco, CA) ;
Henzinger; Monika; (Corseaux, CH) ;
Lawrence; Steve; (Mountain View, CA) ;
Pfleger; Karl; (Mountain View, CA) ;
Sercinoglu; Olcan; (Mountain View, CA) ;
Tong; Simon; (Mountain View, CA)
Assignee: GOOGLE INC.
Mountain View
CA
Family ID: 34381362
Appl. No.: 13/244853
Filed: September 26, 2011

@KorayGubur
Query Analysis
• Trends Related to Topics and Search Terms: Grouping
Topics, and Subtopics announced for Trending Queries.
• Access Times to Determine Freshness and Staleness:
Compares the First Access and Last Access time for
certain documents.
• Frequency of Selection: Compares the selection count
for the first and latter time.
• When Staleness Might be Preferred: Even if there is
fresh news, or documents, the user can choose the stale
document. These documents are not affected by stale
information.
• Spam Determination Based Upon Breadth of Rankings,
and Authority: If the document is popular, or
authoritative (link-based), or the source is relevant
enough, it will be an exception.
Mountain View
CA
Family ID: 34381362
Appl. No.: 13/244853

@KorayGubur
Query Analysis
• Continuation of the Historical Data
Patent.
• Speaks about Topics, and Query
Categorization based on Topics.
• It is important beause, same year,
Google Launched its Knowledge Graph
with 5 million entities, and 500 million
facts.
@KorayGubur
Mountain View
CA
Family ID: 34381362
Appl. No.: 13/244853

@KorayGubur
Midpage Query Refinements
• In 2006, Google published the
«Midpage Query Refinements», a.k.a,
Search Suggestions from today.
• The GUI test was between 2004-2006.
• The patent filed in 2003.
• Includes Semantic Query Clusters for
Different Contexts.
• A Matcher, a Clusterer, A Scorer, and A
Presenter.
@KorayGubur
Inventors: Haahr, Paul; (San Francisco, CA) ; Baker, Steven;
(Mountain View, CA)
Correspondence Address:
PATRICK J S INOUYE P S
810 3RD AVENUE
SUITE 258
SEATTLE
WA
98104
US
Family ID: 34228721
Appl. No.: 10/668721

@KorayGubur
• Precomputation Engine has four parts.
• Associator: Query and Document
Association.
• Selector: Document and Query Section
Selector.
• Regenerator: Checks the query logs to
refresh the selections.
• Inverter: Checks the Cached Data for
presenting.
@KorayGubur
(Mountain View, CA)
810 3RD AVENUE
SUITE 258
SEATTLE
WA
98104
US
Family ID: 34228721
Appl. No.: 10/668721

@KorayGubur
• Query Ambiguity: If the query is ambigous,
Search Engine can use the query clusters.
• Homonyms, General Terms, Improper
Context, and Narrow Terms can create a
stateless SERP Instance.
• To prevent this, Semantic Grouping,
Centroids and Centroid distance are used.
• A Query Cluster and Document Cluster can
be paired. If Document cluster is larger, or
more relevant, the query cluster will be
used as query suggestion.
@KorayGubur
(Mountain View, CA)
810 3RD AVENUE
SUITE 258
SEATTLE
WA
98104
US
Family ID: 34228721
Appl. No.: 10/668721

@KorayGubur
• Matcher: Stored query variations are put into a
cluster, and document phrase variations are
matched.
• Clusterer: The matched query variations, and
documents are clustered together. Different
than query clusters.
• Scorer: Determines the center of the centroid.
If the term vectors are distant to the centroid,
another cluster will be chosen by the Clusterer
for Scorer.
• Presenter: Created Clusters, and Centroids are
presented to the user. According to the
preferred choices, presenter will use sub-
centroids.
@KorayGubur
Inventors: Haahr, Paul; (San Francisco, CA) ; Baker, Steven; (Mountain V
CA)
810 3RD AVENUE
SUITE 258
SEATTLE
WA
98104
US
Family ID: 34228721
Appl. No.: 10/668721

@KorayGubur
• During 2017, the patent has been
refreshed.
• The Scorer Method has been changed.
• Representative Queries are chosen based
on centroids.
• For every cluster, a representative query is
chosen.
• According to the cluster size, and relevance
scores, the clusters are aligned.
• And, sub-queries are used as the
refinement queries.
@KorayGubur
Inventors: Paul Haahr and Steven D. Baker
Assignee: Google Inc.
The United States Patent 9,552,388
Granted: January 24, 2017
Filed: January 31, 2014

@KorayGubur
• Inventors of the Midpage Query Refinement
Methodology are Paul Haahr and Steven D.
Baker.
• Steven Baker has written the Google
Synonyms Blog Post for Google’s Synonym
Update before the RankBrain Announcement.
• Helping Search Engines to Understand
Language:
https://googleblog.blogspot.com/2010/01/hel
ping-computers-understand-language.html
• Paul Haahr is the owner of the How Google
Works Presentation from SMX West. Includes
lots of useful insights.
@KorayGubur
Inventors: Paul Haahr and Steven D. Baker
Assignee: Google Inc.
The United States Patent 9,552,388
Granted: January 24, 2017

@KorayGubur
Context-Vectors
• Midpage Query Refinements and Query-
Document Logical Pairs with Centroids and
Clusters are the beginning of RankBrain.
• Context-Vectors were the second step for
completing the journey.
• Word Vectors and Context Vectors are
different from each other.
• Word Vectors are the combination of
words.
• Context Vectors are the list of combination
of words for a Contextual Domain.
• Term Vector is a word combination from a
Contextual Domain.
@KorayGubur
Inventors: David C. Taylor
Application Date: 09/04/2012
Grant Number: 09449105
Grant Date: 09/20/2016

@KorayGubur
Context-Vectors
• Context-Vectors are close to the ‘Lexicon’
of the first research paper of Google which
is An Anatomy of Large Hypertextual Web
Search Engine document.
• Context-Vectors are the version of Lexicon
with different Contextual Domains.
• Context-Vectors are located in Domain List
Terms.
• A Domain List Terms can include 800.000
words, and word combinations.
• A Domain List Terms can include a macro-
context, and a sub-context with sub-
portions.
@KorayGubur

@KorayGubur
Context-Vectors
• Context-vectors use ‘Topical Entries’.
• A Topical Entry, can be used for macro-
context.
• These topical entries can be used for
question generation.
• Generated questions can be used for
differentiating the different sub-contexts
from each other.
• A Macro-context can have a Dominant
Knowledge Domain. A Context-Vector can
be used for intersectional areas.
@KorayGubur

@KorayGubur
Categorical Quality
• This is an ‘Re-ranking’ Algorithm Patent.
• There is a strong difference between the
Re-ranking and Initial Ranking.
• Re-ranking Algorithms are the modifying
algorithms for the Query Results.
• Inventor is Tyrstan Upstill, author of the
Evidence-based Ranking Research.
• Categorical Quality doesn’t focus on
relevance, or authoritativeness, it focuses
on Understanding the Category of the
Query.
@KorayGubur
Inventors: Trystan G. Upstill, Abhishek Das, Jeongwoo
Ko, Neesha Subramaniam, and Vishnu P. Natchu
US Patent Application: 20190155948
Published on: May 23, 2019

@KorayGubur
Categorical Quality
• This patent mentions the ‘social media shares’
and community size.
• If the query satisfy the ‘categorical query’
conditions, the search results will be evaluated
for related and close queries too.
• If a resource satisfies also the related categorical
queries, a categorical quality score will be
assigned to the source.
• Categorical Quality Methodology collects
Navigational Queries for different sources.
• If the source has more navigational queries, it
means that it has a popularity for the category.
• Categorical Quality mentions «Topicality Score».
@KorayGubur

@KorayGubur
Categorical Quality
• If a source includes all query terms for a
topic, it will have more Categorical Quality
and Topicality Score.
• This method also mentions ‘Click
Selection.’
• To understand the Model’s Success, they
do not take every click or CTR into
account.
• They take CTR and Clicks into account if it
meets with certain criterias such as time,
frequency, or personal interest.
@KorayGubur

@KorayGubur
Substitue Query
• Substitue Query is the query that can replace
another query. These queries are used for
bolding the some sections of the content.
• Substitue Queries make ‘context’ more
important. Because, synonyms make change
the context. Such as, car and auto can be
same thing for ‘repair’ but they are not same
for ‘railroad’.
• There is a railroad car, but not auto.
• Thus, Sustitue Queries are not synonyms.
They are the replacble words without
changing the context.
@KorayGubur
Invented by Daisuke Ikeda and Ke Yang
Assigned to Google
US Patent 8,504,562
Granted August 6, 2013
Filed: April 3, 2012

@KorayGubur
Substitue Query
• Co-occurence Matrix and Phrase-
based Indexing are used to support
the Substitue Queries.
• The method uses the Space Vectors
to compare the word vectors to each
other.
• If the queries are similar to each
other with enough co-occurent
words, it means that they can be
subtitue to each other.
@KorayGubur
Invented by Daisuke Ikeda and Ke Yang
Assigned to Google
US Patent 8,504,562
Filed: April 3, 2012

@KorayGubur
Synthetic Query
• Synthetic Query is the re-written version of
the query of the user by the search engine.
• A search engine can re-write a query by
augmenting the query to diversify the SERP
Features for a better search activity
satisfaction possibility.
• Some score types that Synthetic Queries
include are ‘Edit Distance Score’, ‘Similarity
Score’, ‘Transformation Cost Score’.
• Synthetic Queries can be collected from web
documents, Structured Data, and Similarity
Between Documents.
@KorayGubur
Inventors: Anand Shukla, Mark Pearson, Krishna
Bharat and Stefan Buettcher
Assignee: Google LLC
US Patent: 9,916,366
Granted: March 13, 2018
Filed: July 28, 2015

@KorayGubur
Synthetic Query and
Query Templates
• Query Templates are intermediary forms between the
Seed Queries and Synthetic Queries.
• Synthetic Queries are helpful for a Search Engine to
create pre-defined and pre-ordered SERP Instances.
• Synthetic Queries can be generated from HTML Tags,
IDF Scores, Close Phrases.
• If a Document has «Dorothy Parker Biography» as H1,
and «Sylvia Plath» as H2.
• Search Engine can use the «Sylvia Plath Biography» as
a synthetic query.
• If the results are good enough for relevance and
quality, the Synthetic Query will become a Seed
Query.
@KorayGubur
Invented by Steven D. Baker, Michael Flaster,
Nitin Gupta, Paul Haahr, Srinivasan Venkatachary,
and Yonghui Wu
Assigned to Google
US Patent 8,346,792
Granted January 1, 2013
Filed: November 9, 2010

@KorayGubur
Synthetic Query and
Query Templates
• Synthetic Queries can be generated from
the same author, same journal, source, or
time of period.
• Synthetic Queries and Open Information
Extraction are closely related to each
other.
• Before entering the world of entities,
understanding the world of phrases are
important.
• Open Information Extraction, and
Unknown Phrases, Entities are connected
to each other.
@KorayGubur
Invented by Steven D. Baker, Michael Flaster,
Nitin Gupta, Paul Haahr, Srinivasan Venkatachary,
and Yonghui Wu
Assigned to Google
US Patent 8,346,792
Filed: November 9, 2010

@KorayGubur
Open Information Extraction
• Google bought Wavii for 30.000.000$ in
2013.
• Open Information Extraction is about ‘fact
extraction’ around nouns.
• It is for connecting different nouns to each
other based on relations.
• A classifier assigns a confidence scores to
a relation between two nouns.
• This is a text-to-data example.
• Wavii was originally a news aggregator
based on topics, not phrases.
@KorayGubur
Invented by Michael J. Cafarella, Michele Banko,
and Oren Etzioni
Assigned to: University of Washington through its
Center for Commercialization
United States Patent 7,877,343

@KorayGubur
Open Information Extraction
• The relational tuples include at least two
nouns by connected to each other at least
one verb and adverb, such as ‘created by’,
‘author of’, ‘is from’, ‘located there’.
• ‘... Moreover, the number and complexity
of entity types on the Web means that
existing NER systems are inapplicable...’
• Open IE is for Unknown Entities, and
recognizing Minor Entities without a
registration to the Knowledge Base.
@KorayGubur
Invented by Michael J. Cafarella, Michele Banko,
and Oren Etzioni
Assigned to: University of Washington through its
Center for Commercialization
United States Patent 7,877,343

@KorayGubur
Answer-seeking Query
• Answer-seeking Queries have specific
elements within the questions, and
answers.
• Google’s purpose is that extracting
question and answer formats for answer-
seeking queries.
• Answer-seeking queries requires concise
answers without any skepticism.
• Answer-seeking Query is an important
bridge between the Natural Language
Queries with an Intent.
@KorayGubur
Inventors: Yi Liu, Preyas Popat, Nitin Gupta, and Afroz
Mohiuddin
US Patent: 10,592,540
Filed: June 28, 2016

@KorayGubur
Answer-seeking Query
• Question Elements are, Entity Instance,
Entity Class, Part of Speech Class, Root
Word, N-Gram and Question Triggering
Words.
• Answer Elements are Measurement, N-
Gram, Verb, Preposition, Entity_instance,
N-gram near entity, verb near entity,
preposition near_entity, verb class, skip
grams.
• Answer-seeking Queries trigger Answer
Scoring Engine,
@KorayGubur
Inventors: Yi Liu, Preyas Popat, Nitin Gupta, and Afroz
Mohiuddin
US Patent: 10,592,540
Filed: June 28, 2016

@KorayGubur
Natural Language Queries
• Natural Language Queries are the queries
with the daily language.
• They do not have a proper grammar rule,
or complete sentence.
• They do not explicitly tell their intent.
• That’s why these queries also called Intent
Queries, or Queries with a specific minor
intent.
• For such a query, a Search Engine should
return an answer without lots of details,
or structure.
@KorayGubur
International Application No WO/2014/197227
Published:11.12.2014
International Filing Date: 23.05.2014
Applicant: Google
Inventors: Tomer Shmiel, Dvir Keysar, and Yonatan Erez

@KorayGubur
• Natural Language Queries are not Factual-queries, this is
the main difference for Answer-seeking queries.
• Natural Language Queries are related to the Intent
Template Generation.
• A Natural Language Query can have multiple intents with a
non-factual information, such as ‘How do I make
hummus?’.
• There might be different methods to make a hummus, and
there are different types of hummus, also, the query
includes ‘I’. So, no one can know how you do hummus.
• The answer-seeking version of this query is that ‘How to do
hummus’.
• One of the important methodology points from here is that
Google creates ‘heading-text’ pairs to understand the
topics of the sub-sections of the article.
@KorayGubur
Applicant: Google

@KorayGubur
• Variable and Non-Variable Portions are
important concepts for the intent templates.
• Non-variable section of the intent for the
previous query is ‘hummus’.
• The variable section or portion can be a
‘place, method, tool, or style’. And, ‘I’ can
change as a child, as a women, men, or adult
and blind person.
• For Natural Language Queries, the Intent
Templates can be implemented to different
Query Patterns such as X Causes, X Reasons.
• If someone searches for only X, the intent
templates will be used to assign the natural
language results to the query.
@KorayGubur
Applicant: Google

@KorayGubur
Query Rewriting for Same
Intnet Across Languages
• Google tried to unite different search
intents, data for these intents, and phrases
that represents these intents to each other
to improve the search results before.
• This is called Query Expansion. Query
Expansion can compare results for a query
from a language, to results for the same
query with a different language.
• If the click satisfaction possibility is higher
for another language, for the same intent,
search engine can re-rank the results for the
first language.
@KorayGubur
Invented by Stefan Riezler, Alexander L. Vasserman
Assigned to Google
US Patent Application 20080319962
Published December 25, 2008

@KorayGubur
Seed-Queries
• Seed Queries can be synthetic queries,
user generated queries. The main
necessity for a seed query is that the
query should be satisfying with a set of
documents.
• If a query is logical, popular and satisfying
for the user, it will be marked as seed
query whether it is synthetic or searcher
generated.
• Seed Queries are used to determine the
representative queries for query
variations, query and intent templates.
@KorayGubur
Inventors Manaal Faruqui and Dipanjan Das
Applicants Google LLC
Publication Number 20200167379
Publication Date May 28, 2020

@KorayGubur
End of Phrase-based Indexing and Query
Processing Chaos
• Query Parsing
• Seed Query
• Substitue Query
• Natural Language Query
• Answer-seeking Query
• Factual Query
• Non-factual Query
• Non-variable Portion in Query
• Variable Portion in Query
• Discordant Query
• Query Re-writing
• Open Information Extraction
• Synthetic Query
• Categorical Query
• Contextual Vectors
• Term Vectors @KorayGubur
• Intent Templates
• Question and Answer Elements
• Co-occurence Matrix
• Query Expansion
• Query Term Weight
• Multi-stage Query Processing
• Query Breadth
• Query Template
• Relation Types and Noun Tuples
• Macro-context
• Topical Entry
• Mid-page Query Refinement
• Query Ambiguity
• Query Cluster – Document Cluster for Logical Pair
• Associator, Matcher, Scorer for Query, Document
Association
• Edit Distance Score’, ‘Similarity Score’, ‘Transformation
Cost Score’.
• Phrase-based Indexing
• Contextual Domains
• Contextual Domain Word List
• Query Analysis
• Representative Query
• Canonical Query
• Minor Intent
• Space Vectors
• Navigational Query as a
Popularity Signal
• Evidence Based Ranking
• Word Proximity
• Word Adjecency
• Query Term Weight

@KorayGubur
First Semantic Web Announcement
• Semantic Web Roadmap has been published
in September 1998 by Tim Barners-Lee.
• Semantic HTML, and Semantic Web,
Semantic User Patterns were the principles
of Semantic Search.
• The main purpose of Semantic Web is
making the web understandable to machines
so that machines can help humen-beings for
better web surfing.
• Tim Barners Lee talked about Agents,
Ontology, Structured Data, RDFa, or Semantic
HTML Tags and Digital Signature.
• ‘Such an agent coming to the clinic's Web
page will know not just that the page has
keywords such as "treatment, medicine,
physical, therapy" (as might be encoded
today) but also that Dr. Hartman works at
this clinic on Mondays, Wednesdays and
Fridays and that the script takes a date
range in yyyy-mm-dd format and returns
appointment times. And it will "know" all
this without needing artificial intelligence ‘ @KorayGubur
‘The Semantic Web is an extension of the current web in
which information is given well-defined meaning, better
enabling computers and people to work in cooperation.’
-Tim Barners-Lee

@KorayGubur
First Semantic Search Patent
• Google’s first Semantic Search Engine patent
is from 1999. One year later from Tim
Barners-lee’s announcement.
• The Inventor is directly Sergey Bring.
• Document doesn’t have a legal language, like
other first patent instances of Google.
• Document tells that every thing from similar
type has same features.
• Things on the web can be collected for
certain type of information and stored with
this information.
@KorayGubur
Invented by Sergey Brin
Assigned to Google
US Patent 6,678,681

@KorayGubur
First Semantic Search Patent
• Sergey Brin encountered some problems
such as Named Entity Recognition, or Main
Entity, and Entity Relation Detection.
• These problems are not called based on
Entities, but these books were entities with
string representations.
• Even a single letter difference resulted in big
problems for Sergey Brin.
• And, some books didn’t have price, or proper
title, and some of them were not even real
books.
• In the first trying, the cost was high, process
was slow, results were half, but Google kept
going.
@KorayGubur
Invented by Sergey Brin
Assigned to Google
US Patent 6,678,681

@KorayGubur
Knowledge Graph Launch
• ‘Things, not strings.’ is the motto of
Knowledge Graph. Everything on the web is
divided into different entities, entity types,
entity connections.
• Named Entity Recognition, and Natural
Language Processing increased its value and
prominence within the algorithmic hierarchy
of Google.
• Knowledge Graph supported the Knowledge
Panels.
• Fact Extracting, Question Answering,
Accuracy Audit, and Entity Relations are the
columns of Entity-oriented Search Engine.
• ‘Wouldn’t it be great understanding every
word of user, instead of matching words?’, by
Jack Menzel.
@KorayGubur
Inventors: John R. Provine
US Patent: 10,922,326
Granted: February 16,
2021

@KorayGubur
Browsable Fact Repisotory
• Browsable Fact Repisotory is the main and
primitive version of the Google Knowledge
Graph.
• There are three important problems for
Browsable Fact Repisotory.
1. Updating the Knowledge Graph.
2. Extracting the New Entities.
3. Auditing the Fact Accuracy.
@KorayGubur
Invented by Andrew W.
Hogue and Jonathan T.
Betz
Assigned to Google Inc.
US Patent 7,774,328
Filed: February 17, 2006

@KorayGubur
Entity-seeking Query
• Today’s last Query type.
• Entity-seeking Queries are one of the
basic columns of Entity-oriented search.
• Identify the Query seeks for a singular
entity, or plural things from same type.
• If it is singular, entity-seeking query will
match the term and the entity based on
an attribute.
• Entity-seeking Queries include a Semantic
Dependency Tree, Relevance Threshold
@KorayGubur
Inventors: Mugurel Ionut Andreica, Tatsiana Sakhar,
Behshad Behzadi, Marcin M. Nowak-Przygodzki, and
Adrian-Marius Dumitran
Published: December 5, 2019
Filed: May 29, 2018

@KorayGubur
Entity-seeking Query
@KorayGubur

@KorayGubur
Structured Search Engine
@KorayGubur
• Sergey Brin said, ‘Structured Form’ in 1999.
• In 2011, Andrew Hogue said Structured
Search Engine.
• Andrew Hogue introduced the Open-
Domain Fact Extraction methodologies for
extracting, clustering entities from the web.
• Andrew Hogue has showed some concrete
examples to the future Google Engineers for
the direction that they want to head.
Cartoon is created by Gary Larson.

@KorayGubur
Semantic Search Engine
@KorayGubur
• Google can extract all attributes of an entity
to understand its general features.
• According to the Source Attribute, these
features can be changed, detected or
altered.
• Based on the entity types, and candidate
entities, Google can generate more entity
types, and connections between them.
• Structured Search Engine’s other name is
Semantic Search Engine.
• Semi-structured Text Understanding,
Question Generation from Keywords, and
Question-Answer Pairing are the main
objectives of Semantic Search Engine.

@KorayGubur
@KorayGubur
This is a Query Parsing Example from a Google Engineer
for Entity-oriented Search.
Source: The Structured Search Engine by Andrew Hogue

@KorayGubur
@KorayGubur
This is a Query Parsing Example from a Google Engineer for
Entity-oriented Search.
Named Entity Recognition process for the
query.
• Entity-seeking Queries are the backbone
of the entity oriented search.
• Recognizing an entity from a Query is not
easy, or cheap.
• Neural Matching, RankBrain, Sub-topic
Update, or BERT, MuM, LaMDA... All of
them are used for recognizing the entity,
and its related attributes.

@KorayGubur
@KorayGubur
This is a Query Parsing Example from a Google Engineer for
Entity-oriented Search.
Second step is Entity Resolution.
• Entity Resolution, and Attribute
Extraction are for understanding the
related attribute of the entity.
• Entity-seeking Queries usually try to find
an Entity’s Attribute such as look, height,
taste, inception or history.
• After the entity and its attribute are taken
from the query, at the next step,
Question Format will be taken.

@KorayGubur
@KorayGubur
This is a Query Parsing Example from a Google Engineer for Entity-
oriented Search.
Third step is Synonym Extraction.
• Synonym Extraction is for strenghten the
confidence score.
• Other function of the Synonym Extraction
is that, it helps for using alternate
documents for the same question.
• According to the Synonyms, the question
format can change.

@KorayGubur
@KorayGubur
oriented Search.
Question format is necessary to understand
the query by increasing the confidence
score, and matching the similar successful
documents.
• Question format is important to
determine the answer format.
• Quetion term order, and answer term
order can increase the success rate.
• The last important thing here is that the
‘answer data type’ which is a date.

@KorayGubur
@KorayGubur
oriented Search.
Forth step is Entity Reconciliation and data accuracy audit.
At the next step, Google can check the related search
activity, possible search activity, and choose the best
answer.
• The answer formats, and answer phrases will be used
for entity reconcilation.
• Entity reconcilation includes the standartization of the
entity with the correct information.
• 5 Rand Fishkin Entity Recording exist in Knowledge
Graph, for same Rand Fishkin.

@KorayGubur
@KorayGubur
oriented Search.
Entity Reconcilation
Inventors: Oksana Yakhnenko and Norases
Vesdapunt
Assignee: GOOGLE LLC
US Patent: 10,331,706
Granted: June 25, 2019
Filed: October 4, 2017
Entity Reconcilation is another patent from Google.
• It includes checking multiple sources to complete the missing
information on the Knowledge Graph.
• It also uses similarity threshold between different sources and the
knowledge graph.
• If the source is authoritative, it will be easier to modify the
Knowledge Graph.

@KorayGubur
@KorayGubur
oriented Search.
“For other people it can be a little more complicated. Like me, for
example, John Mueller. If you search for me you’ll find Wikipedia pages,
barbecue restaurants, bands, all kinds of people who are called John
Mueller.
And if, on my site, I don’t specify who I actually am, then it could
happen that our systems look at my page and go: “oh this is that guy
that runs that barbecue restaurant.” And suddenly I’m associated with
a barbecue restaurant, which might be a move up, I don’t know.
But these subtle things make it easier for us to recognize who is
actually behind something. We call that reconciliation when it comes to
structured data, kind of recognizing which of these entities belong
together.”
John Mueller

@KorayGubur
@KorayGubur
oriented Search.

@KorayGubur
@KorayGubur
This is a Query Parsing Example from a Google Engineer for Entity-oriented
Search.

@KorayGubur
@KorayGubur
Semantic Role Labeling
Named Entity Resolution
Named Entity Extraction
Relation Detection
Lexical Semantics
Taxonomy
Ontology
Onomastics
Important Terms and Concepts for NER and Semantic Search Engine

@KorayGubur
@KorayGubur
Entity Extraction
• Entity extraction is a complementary step for
Named Entity Recognition.
• Recognized Entity can be extracted from the
text to be stored in a Knowledge Base.
• Entity Extraction uses attributes to connect
the entity and its meaning, prominence and
attributes to each other.
• In the sentence of ’46th President of United
States (US) had decided to go Paris on
Monday, 2th june, 2002.’
• ‘46th President of United States’ is the
named entity.
• The decision of the president is the attribute
with the date contribution which is included
in entity extraction.

@KorayGubur
@KorayGubur
Entity Resolution
• Entity Resolution has two phases.
• First phase is finding the mention entity’s
correct idendity.
• Second phase is finding the correct profile of
the mentioned entity.
• For instance, Bill Clinton was a U.S President,
but also an Actor in Hollywood. An American
Football Player can be also a cook, or
journalist.
• To find the right entity, from the entity
reference, Search Engine can use related
entities, and their types.
• Entity Resolution helps for feeding the text-
to-data systems of Search Engines.
• If you tell ‘Barry Scwhartz entered to
classroom and asked questions to the
students’, the Entity Resolution will decide
that it is the Professor Barry, not our Barry.

@KorayGubur
@KorayGubur
Relation Detection
• Relation Detection is the process of
understanding the relation type and labels
between different entities within a text.
• There are different types of relations, such as
‘isSimilarOf’, ‘locatedIn’, ‘superiorOf’,
‘closeTo’, ‘sameAs’.
• Some of these relation types are familiar
from the Structured Data.
• Some of the relation types are unique for
specific entities and specific topics.
• Relation Detection takes power from the
Lexical Semantics.
• Relation detection can be used for Visual-to-
text algorithms too.

@KorayGubur
@KorayGubur
Lexical Semantics
• Lexical Semantics should be known by every
human-being for thinking and speaking in a
healthy way.
• Lexical Semantics include semantic meaning
connections between different words.
• Lexical Semantics are used to understand the
relational connections between named
entities.
• For instance, ‘Boy’ includes ‘single’, ‘teenage’,
‘male’, ‘young’ meanings as default. But,
some of these meanings have high possibility,
some of them low.
• For instance, someone young, male, teenage
can be also married.
• Lexical Semantics are used to understand the
named entity’s resolution and connection
with other things.
Lexemes: not analyzable unit, by itself.
Lexicon: List of lexemes.

@KorayGubur
@KorayGubur
Semantic Role Labeling
• Semantic Role Labeling is the process of
understanding the parts of a sentence by
assigning related labels.
• Semantic Role Labeling takes power from
Lexical Semantics, and Part of Speech Tag.
• Semantic Role Labeling helps Relation
Detection.
• There are more than 32 Semantic Roles.
• For Semantic Role Labeling, the most
important part is finding the theme,
predicate, agent, and effect.
• Semantic Role Labeling is beneficial to audit
the content’s accuracy, and fact extraction
from the prepositions.

@KorayGubur
@KorayGubur
Taxonomy
• Taxos-logos, or Taxonomy means arrangement of
things.
• It is used for animal classification first, in Anceint
Greek.
• In moden era, it is used for all living thing classification
in biology, and then it has been used for classification
of chemical, or other types of existing things.
• In the field of Search Engine Optimization, Semantic
Entity Types, and Semantic Dependency Tree is
important.
• Creationg a hierarchy between entities based on their
type and size, prominence or superiority and
inferiority is important to increase the contextual
relevance, and specifying the relevance of the article.
• Every entity type has a different attribute group, and
hierarchy can be refreshed.
• If the context is size of cities, ‘berlin’, ‘paris’, ‘istanbul’
can have a different taxonomy, in terms of big, small,
medium cities.
• If the context is countries of these cities, taxonomy
can be aligned with country names, and region,
continent names.

@KorayGubur
@KorayGubur
Ontology
• Ontology completes the taxonomy.
• Ontos-logos, essence of things.
• It is a barnch of philosophy.
• Ontology is a reflex for all human-beings.
• Ontology can be created based on mutual
points of different entities.
• According to the mutual attribute between
entities, the taxonomy can change, and
ontology can follow it also.
• If three named entities are from same region,
region name is the mutual attribute, and it
can have other types of connections based
on this.

@KorayGubur
@KorayGubur
Onomastics
• Onomastics is the science of naming, and
analyzing the name patterns for different
languages.
• Every enttiy type has a different naming pattern.
• Name patterns are used to recognize entities,
entity types, and attributes of entities.
• It comes from onoma and stikos, means names
of things.
• Different science names, city names, event
names, situation names, or instituion names can
have naming patterns.
• Some onomastics sub-type examples,
1. helonyms: proper names of swamps, marshes and bogs.
2. limnonyms: proper names of lakes and ponds.
3. oceanonyms: proper names of oceans.
4. pelagonyms: proper names of seas and maritime bays.
5. potamonyms: proper names of rivers and streams.
• Onomastics can be used for taxonomy and
ontology creation too. Even a water can have
multiple naming patterns based on sub-types.

@KorayGubur
@KorayGubur
BERT - SMITH
MuM
LaMDA
Conversational Search
Important Announcements for Structured Search Engine

@KorayGubur
@KorayGubur
BERT - SMITH
Uses, Masked Language Model.
It masks 15% of every tokens for prediction model.
Used, Bidrectional Language Understanding.
It reads all sentence at once from both direction.
It predicts the next sentence.
Used bigger tokens than 512 with SMITH.
Used fine-tuning based representation model.

@KorayGubur
@KorayGubur
MuM
The research papers have been taken in 2021 March.
In 2021 May, they announced MuM.
In 2021 June, they announced that they started to use MuM.
All system is related to the understand ‘Related Search Activity’ to predict the future queries.

@KorayGubur
@KorayGubur
MuM
If you search for trekking to a mountain, there are three possible different contexts:
Trekking
Mountain
And, Specific Mountain Trekking

@KorayGubur
@KorayGubur
LaMDA
LaMDA is for connecting a question to another with Human Sensible Way.
Specifity
Factuality
Interestingness
Sensibleness
LaMDA is a part of Conversational AI.

@KorayGubur
@KorayGubur
Conversational Search
Conversational Search is close to Conversational AI.
It connects different entities, concepts, intents to each
other.
Creates new Contextual Domains, and Co-occurence
Matrixes.
Conversational Search Announcement includes only the
past queries.
MuM, and LaMDA includes future queries.

@KorayGubur
@KorayGubur
Important Language Models for Near Future in the context of Semantic Search Engine
ReALM
KeALM

@KorayGubur
@KorayGubur
ReALM
Retrieval Augmented Language Model
Based on Entity Dependency Tree, missed attributes and facts can be extracted.
Source: https://ai.googleblog.com/2020/08/realm-integrating-retrieval-into.html

@KorayGubur
@KorayGubur
ReALM
Inventors: Kenton Chiu Tsun Lee,
Kelvin Gu, Zora Tung, Panupong
Pasupat, and Ming-Wei Chang
US Patent: 11,003,865
Granted: May 11, 2021
Filed: May 20, 2020
First a Research Paper,
Then, a Patent.
Lastly, an Update with Official Statement,
Or Non-Official Statement.

@KorayGubur
@KorayGubur
KeALM
Knowledge Graph Integrated Language Model for Fact and
Accuracy Checking.
Source: https://ai.googleblog.com/2021/05/kelm-integrating-
knowledge-graphs-with.html
Data to text Triple Example

@KorayGubur
@KorayGubur
Encazip.com.
Holistic SEO Case Study based on Semantic SEO.
Used Entity-oriented Search.
From daily 150 clicks to 6.000 clicks.

@KorayGubur
@KorayGubur
An Education Brand
11.000 queries and 30.000 monthly clicks within 25 days

@KorayGubur
@KorayGubur
An unpublished case study.
422.000 queries, 220.000 clicks in 66 days.
It is also a Technical SEO Case Study.
Indexed 73.000 pages in 66 days.

@KorayGubur
@KorayGubur
15.000 New Queries.
35.000 monthly traffic.
In 3 months.
Used Semantic SEO

@KorayGubur
@KorayGubur
‘Without understanding the Query Processing in the eyes of
Search Engine, you can’t create the relevant, and satisfying
document based on minor and dominant search activity
types.’
Thank You

Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities

Similar to Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities (20)

Recently uploaded

Recently uploaded (20)

Semantic Search Engine: Semantic Search and Query Parsing with Phrases and Entities