Semantic Search, Machine Learning & AI in Google's Latest Algorithms

Semantic Search,
Machine Learning & AI in
the latest Google
algorithms

More than 80 international certificates
Over 40 lectures in Bulgaria
More than 10 lectures abroad
Over 15 years of professional experience
More than 20 interviews for Bulgarian and international medias
More than 20 interviews for Bulgarian and international medias
The only Bulgarian SEO agency with a presentation at a Google event
The only Bulgarian SEO agency officially accredited by Stone Temple
Nominees at the Europe Search Awards 2019
ABOUT SERPACT

WHAT IS MACHINE LEARNING
AND WHY SEARCH ENGINES
USE IT?

FIND PATTERNS FOR URLS AND
PAGE CONTENT
Multiple outbound links to
unrelated pages
Excessive use of the same keywords
Excessive use of synonyms
Overoptimization of anchor texts
Other similar variables.
ANALYSIS OF SEARCH AND
CLASSIFICATION PHRASES
One of the best applications of machine learning
algorithms is the classification of search phrases
and, accordingly, index documents based on the
user’s search intent.
As we know, phrases can generally be –
information, navigation and transactional.
By analyzing click patterns and the type of
content that users engage with (e.g., CTR by
content type), a search engine can use machine
learning to determine user intent.

Identification of synonyms
When you see search results
that don’t include the keyword
in a snippet, it’s likely that
Google uses machine learning to
identify synonyms.

Better identification of word connections
Google’s purpose for synonyms is to understand the context and meaning of a page.
Creating content in a clear and consistent sense is much more important than
spamming a page with keywords and synonyms.

Identifying the similarities between the words in a
search query

CUSTOM ALERTS BASED ON
SPECIFIC REQUESTS
Machine learning algorithms can put more weight on variables in some queries than
others. The search engine “learns” about the preferences of a particular user and can
base its information on past queries to present the most interesting information.
Overall, according to consumer research, personalized phrases through machine
learning have increased the clickthrough rate (CTR) of results by about 10%.

IDENTIFYING NEW ALERTS
According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps
identify patterns in queries but also helps the search engine identify possible new
ranking signals. These alerts are being sought so that Google can continue to
improve the quality of search results.

WHAT IS NATURAL LANGUAGE
PROCESSING -
OR HOW SEARCH ENGINES UNDERSTAND
OUR CONTENT?

Tasks of NLP
Tokenization, Lemmatization, Stemming
Sentence boundary detection
Part-of-speech tagging
Syntax parsing - Dependency parsing & Constituency parsing
Semantic role labeling
Semantic dependency parsing
Word sense disambiguation/induction
Named-entity recognition/classification
Entity linking
Temporal expression recognition/normalization
Co-reference resolution
Information extraction
Terminology extraction
Topic modeling

Attributional similarity (word similarity)
Relational similarity
Phrase similarity
Sentence similarity
Paraphrase identification
Textual entailment
Natural language generation
Speech recognition
Speech synthesis
Ontology population
Question answering
Machine translation
Text coherence
Fake news detection
Tasks of NLP

Serpact Ltd. | AffiliateCon Sofia 2019
How Search Engines Like
Google Process The Content
Today?

Text Pre-Processing
Noise Removal
Lexicon Normalization
Stemming: Stemming is a rudimentary rule-based
process of stripping the suffixes (“ing”, “ly”, “es”, “s”
etc) from a word.
Lemmatization: Lemmatization, on the other hand, is
an organized & step by step procedure of obtaining
the root form of the word, it makes use of
vocabulary and morphological analysis (word
structure and grammar relations).
Object Standardization - acronyms, hashtags with
attached words, and colloquial slangs
Normalization and Lemmatization: POS tags are the
basis of lemmatization process for converting a word
to its base form (lemma).
Efficient stopword removal : P OS tags are also useful
in efficient removal of stopwords.

Syntactic Parsing

Syntactic Parsing
Dependency Trees – Sentences are composed of some words sewed together. The relationship among the
words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of
syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items
(words).
Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with
a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function
of a word in the sentence.
Word sense disambiguation: Some language words have multiple meanings according to their usage. For
example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the
flight”

Entity Extraction
Named Entity Recognition (NER)
Noun phrase identification
Phrase classification
Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on
top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge
graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.

Topic Modelling - Latent Dirichlet
Allocation (LDA)
Topic modeling is a process of automatically
identifying the topics present in a text corpus, it
derives the hidden patterns among the words in the
corpus in an unsupervised manner.
Topics are defined as “a repeating pattern of co-
occurring terms in a corpus”. A good topic model
results in – “health”, “doctor”, “patient”, “hospital” for a
topic – Healthcare, and “farm”, “crops”, “wheat” for a
topic – “Farming”.

Bag of Words
Is a commonly used model that allows you to count all words in a piece of text.
Basically it creates an occurrence matrix for the sentence or document,
disregarding grammar and word order. These word frequencies or occurrences
are then used as features for training a classifier.

N-Grams as Features
A combination of N words together are called N-Grams. N grams (N > 1) are
generally more informative as compared to words (Unigrams) as features. Also,
bigrams (N = 2) are considered as the most important features of all the others.

Statistical Features
Term Frequency – Inverse Document Frequency (TF – IDF)
Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D”
Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total
documents available in the corpus and number of documents containing the term T.
Count / Density / Readability Features - Count or Density based features can also be used in
models and analysis. These features might seem trivial but shows a great impact in learning models.
Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry
specific word counts.

Word Embedding (text vectors)
Word embedding is the modern way of representing words as vectors. The aim of word
embedding is to redefine the high dimensional word features into low dimensional feature
vectors by preserving the contextual similarity in the corpus.
They are widely used in deep learning models such as Convolutional Neural Networks and
Recurrent Neural Networks.

Final Result
Text Classification - Email Spam Identification, topic classification of news, sentiment classification
and organization of web pages by search engines.
Text Matching / Similarity
Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name,
location name etc) and produces a character string that identifies a set of words that are (roughly)
phonetically similar.
Flexible String Matching – A complete text matching system includes different algorithms pipelined
together to compute variety of text variations. (exact string matching, lemmatized matching, and
compact matching (takes care of spaces, punctuation’s, slangs etc).
Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can
also be applied in order to measure vectorized similarity. Following code converts a text to vectors
(using term frequency) and applies cosine similarity to provide closeness among two text.
Coreference Resolution- it is a process of finding relational links among the words (or phrases) within
the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“

What is GOOGLE BERT?

According to Google:

The phrase was “how to catch a cow fishing?
”In New England, the word “cow” in the context of
fishing means a large striped bass. A striped bass is a
popular saltwater game fish that millions of anglers fish
for on the Atlantic coast.
So earlier this month, during the course of research for a
PubCon Vegas presentation, I typed the phrase, “how to
catch a cow fishing” and Google provided results related
to livestock, to cows.
An Example?

How to Write Better
Optimized Texts

Understand
What Your
Audience
Wants…

AnswerThePublic
Google Trends
Ahrefs
Semrush
Keywordtool.io
Ubersuggest.io
Moz Keyword Tool
Serpstat
SpuFu
Google Search Console
Use Keyword / Phrases Research Tools

Group keywords around topics
Group keywords around intent
Group keywords around common classifiers - colors, w-words, sizes,
locations, brands etc.

Answer a question you want to target and provide the best answer
Answer Follow Up Questions

Be Careful With Your Website Structure - Be Topical
& Map Keywords

Connect questions & answers in
your content
Connect your current and following question
Combine questions into a piece of content
Split the questions into sub-topics
Optimizing content for NLP should begin with
simple sentence structure and focused on
providing concise information for your audience.
You should always try to submit the exact
questions that people ask along with relevant
answers since people will be typing those
questions into search engines.

Identify Units, Classifications, and
Adjectives
Within NLP SEO, words have meaning and therefore may have expected units,
classifications, or adjectives associated with them. NLP parsing will be on the lookout for
these elements when determining if the content contains the precise answer to a
question. Let’s look at two examples.
Example Query 1: “Safe Temperature for Chicken”
For this query, temperature has a unit of degrees in either Fahrenheit or Celsius
expressed as a numerical value. If a sentence does not include these elements, it does
not satisfy the question. A well-structured sentence should contain a number and the
word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the
answer is more accurate and specific, while also improving our localized targeting.

Be Clear With Your Answers
Reduce Dependency Hops
Reading a sentence and determining if a question is answered depends on
Google’s NLP parsers not getting hung up as they “crawl” through a
sentence. If a sentence’s structure is overly complicated, Google may fail to
create clear links between words or may require it to take too many hops to
build that relationship.
Don’t Beat Around the Bush
A common NLP problem is “beating around the bush” when it comes to
answering a question. It’s not that these answered are “wrong,” but they
don’t give Google a precise determination of the answer.

Be Clear With Your Answers

Follow the Query Through
What is an Emergency Fund
How Much to Save
Types of Emergency Funds
How to Build an Emergency Fund
Answer yourself the question: “does this article answer all the subjects and questions a searcher
might have when they search?” Google can identify these follow through topics and questions by
looking at follow up searches and query refinement within search sessions.
Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them
content that eliminates the need for two to three additional searches.
If a user searches for “Emergency Fund,” they may have the following goals on their journey:

Disambiguate Entities
Isolate the entities when not used in a sentence. When an entity is used outside of a
sentence, try to isolate it in the text, and within an HTML tag where it appears, such as
headings, list items, or table cells.
Avoid grouping it with a price, year, category, parentheses, or any other data/text.
Simplify your content around entities you want Google to extract successfully.
There are two simple rules here:

Disambiguate Entities
When an entity can be confused, such as cities in multiple states, movies with the same
name, or films vs. books, you can disambiguate entities by using indicator words in the
same sentence.
For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For
the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted
entity is Portland, ME.
There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google
to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities
could have multiple identities.
Use indicator words to disambiguate entities.

Text Formatting
Inverted Pyramid: Articles have a lede, body, and a tail. Content has
different meanings based on how far down the page it appears.
Headings: Headings define the content between it and the next heading.
Subtopics: Think of headings as sub-articles within the parent article.
Proximity: Proximity determines relationships.
Words/phrases in the same sentence are closely related.
Words/phrases in the same paragraph are related.
Words/phrases in different sections are distantly related.
Relationships: Subheadings have a parent – > child relationship. (A page with a
list of categories as H2s and products as H3s is a list of categories. A page with a
list of products as H2s and categories as H3s is a list of products.)

HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However,
heading tags are preferred.)Text also doesn’t need a heading tag to have a parent ->
child relationship. Its display and font formatting can visually dictate this without the
use of heading tags.
Lists: HTML ordered and unordered lists function as lists, which have a
meaning.Headings can also perform as lists.Headings with a number first can work
as an ordered list.
Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary”
phrases at the start of a paragraph can function as a list.
Tables inherently imply row/column relationships.Some formatting suggests
classification, like addresses and date formats.
Structure by Content Type: Some content types have expected data that
define them. Events have names, locations, and dates. Products have names,
brands, and prices.
Other Formats:

Google NLP Tool
Webtexttool - Text Metrics
Semrush SEO Writing Assistant
Tools for content optimization:

Contacts
EMAIL ADDRESS
info@serpact.com
PHONE NUMBER
+359 (032) 260 096
WEBSITE
serpact.com
SERP.AC/FB
SERP.AC/TWITTER
SERP.AC/YOUTUBE

Semantic Search, Machine Learning & AI in Google's Latest Algorithms

Recommended

Recommended

More Related Content

Similar to Semantic Search, Machine Learning & AI in Google's Latest Algorithms

Similar to Semantic Search, Machine Learning & AI in Google's Latest Algorithms (20)

Recently uploaded

Recently uploaded (20)

Semantic Search, Machine Learning & AI in Google's Latest Algorithms