SlideShare a Scribd company logo
1 of 52
Download to read offline
Semantic Search,
Machine Learning & AI in
the latest Google
algorithms
More than 80 international certificates
Over 40 lectures in Bulgaria
More than 10 lectures abroad
Over 15 years of professional experience
More than 20 interviews for Bulgarian and international medias
More than 20 interviews for Bulgarian and international medias
The only Bulgarian SEO agency with a presentation at a Google event
The only Bulgarian SEO agency officially accredited by Stone Temple
Nominees at the Europe Search Awards 2019
ABOUT SERPACT
WHAT IS SEMANTIC SEARCH?
WHAT IS MACHINE LEARNING
AND WHY SEARCH ENGINES
USE IT?
FIND PATTERNS FOR URLS AND
PAGE CONTENT
Multiple outbound links to
unrelated pages
Excessive use of the same keywords
Excessive use of synonyms
Overoptimization of anchor texts
Other similar variables.
ANALYSIS OF SEARCH AND
CLASSIFICATION PHRASES
One of the best applications of machine learning
algorithms is the classification of search phrases
and, accordingly, index documents based on the
user’s search intent.
As we know, phrases can generally be –
information, navigation and transactional.
By analyzing click patterns and the type of
content that users engage with (e.g., CTR by
content type), a search engine can use machine
learning to determine user intent.
Identification of synonyms
When you see search results
that don’t include the keyword
in a snippet, it’s likely that
Google uses machine learning to
identify synonyms.
Better identification of word connections
Google’s purpose for synonyms is to understand the context and meaning of a page.
Creating content in a clear and consistent sense is much more important than
spamming a page with keywords and synonyms.
Identifying the similarities between the words in a
search query
Identifying the similarities between the words in a
search query
Identifying the similarities between the words in a
search query
CUSTOM ALERTS BASED ON
SPECIFIC REQUESTS
Machine learning algorithms can put more weight on variables in some queries than
others. The search engine “learns” about the preferences of a particular user and can
base its information on past queries to present the most interesting information.
Overall, according to consumer research, personalized phrases through machine
learning have increased the clickthrough rate (CTR) of results by about 10%.
IDENTIFYING NEW ALERTS
According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps
identify patterns in queries but also helps the search engine identify possible new
ranking signals. These alerts are being sought so that Google can continue to
improve the quality of search results.
WHAT IS NATURAL LANGUAGE
PROCESSING -
OR HOW SEARCH ENGINES UNDERSTAND
OUR CONTENT?
Tasks of NLP
Tokenization, Lemmatization, Stemming
Sentence boundary detection
Part-of-speech tagging
Syntax parsing - Dependency parsing & Constituency parsing
Semantic role labeling
Semantic dependency parsing
Word sense disambiguation/induction
Named-entity recognition/classification
Entity linking
Temporal expression recognition/normalization
Co-reference resolution
Information extraction
Terminology extraction
Topic modeling
Attributional similarity (word similarity)
Relational similarity
Phrase similarity
Sentence similarity
Paraphrase identification
Textual entailment
Natural language generation
Speech recognition
Speech synthesis
Ontology population
Question answering
Machine translation
Text coherence
Fake news detection
Tasks of NLP
Serpact Ltd. | AffiliateCon Sofia 2019
How Search Engines Like
Google Process The Content
Today?
Text Pre-Processing
Noise Removal
Lexicon Normalization
Stemming: Stemming is a rudimentary rule-based
process of stripping the suffixes (“ing”, “ly”, “es”, “s”
etc) from a word.
Lemmatization: Lemmatization, on the other hand, is
an organized & step by step procedure of obtaining
the root form of the word, it makes use of
vocabulary and morphological analysis (word
structure and grammar relations).
Object Standardization - acronyms, hashtags with
attached words, and colloquial slangs
Normalization and Lemmatization: POS tags are the
basis of lemmatization process for converting a word
to its base form (lemma).
Efficient stopword removal : P OS tags are also useful
in efficient removal of stopwords.
Serpact Ltd. | AffiliateCon Sofia 2019
Syntactic Parsing
Serpact Ltd. | AffiliateCon Sofia 2019
Syntactic Parsing
Dependency Trees – Sentences are composed of some words sewed together. The relationship among the
words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of
syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items
(words).
Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with
a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function
of a word in the sentence.
Word sense disambiguation: Some language words have multiple meanings according to their usage. For
example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the
flight”
Serpact Ltd. | AffiliateCon Sofia 2019
Entity Extraction
Named Entity Recognition (NER)
Noun phrase identification
Phrase classification
Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on
top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge
graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.
Serpact Ltd. | AffiliateCon Sofia 2019
Topic Modelling - Latent Dirichlet
Allocation (LDA)
Topic modeling is a process of automatically
identifying the topics present in a text corpus, it
derives the hidden patterns among the words in the
corpus in an unsupervised manner.
Topics are defined as “a repeating pattern of co-
occurring terms in a corpus”. A good topic model
results in – “health”, “doctor”, “patient”, “hospital” for a
topic – Healthcare, and “farm”, “crops”, “wheat” for a
topic – “Farming”.
Bag of Words
Is a commonly used model that allows you to count all words in a piece of text.
Basically it creates an occurrence matrix for the sentence or document,
disregarding grammar and word order. These word frequencies or occurrences
are then used as features for training a classifier.
Serpact Ltd. | AffiliateCon Sofia 2019
N-Grams as Features
A combination of N words together are called N-Grams. N grams (N > 1) are
generally more informative as compared to words (Unigrams) as features. Also,
bigrams (N = 2) are considered as the most important features of all the others.
Statistical Features
Term Frequency – Inverse Document Frequency (TF – IDF)
Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D”
Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total
documents available in the corpus and number of documents containing the term T.
Count / Density / Readability Features - Count or Density based features can also be used in
models and analysis. These features might seem trivial but shows a great impact in learning models.
Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry
specific word counts.
Serpact Ltd. | AffiliateCon Sofia 2019
Word Embedding (text vectors)
Word embedding is the modern way of representing words as vectors. The aim of word
embedding is to redefine the high dimensional word features into low dimensional feature
vectors by preserving the contextual similarity in the corpus.
They are widely used in deep learning models such as Convolutional Neural Networks and
Recurrent Neural Networks.
Serpact Ltd. | AffiliateCon Sofia 2019
Final Result
Text Classification - Email Spam Identification, topic classification of news, sentiment classification
and organization of web pages by search engines.
Text Matching / Similarity
Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name,
location name etc) and produces a character string that identifies a set of words that are (roughly)
phonetically similar.
Flexible String Matching – A complete text matching system includes different algorithms pipelined
together to compute variety of text variations. (exact string matching, lemmatized matching, and
compact matching (takes care of spaces, punctuation’s, slangs etc).
Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can
also be applied in order to measure vectorized similarity. Following code converts a text to vectors
(using term frequency) and applies cosine similarity to provide closeness among two text.
Coreference Resolution- it is a process of finding relational links among the words (or phrases) within
the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“
Serpact Ltd. | AffiliateCon Sofia 2019
What is GOOGLE BERT?
Serpact Ltd. | AffiliateCon Sofia 2019
According to Google:
The phrase was “how to catch a cow fishing?
”In New England, the word “cow” in the context of
fishing means a large striped bass. A striped bass is a
popular saltwater game fish that millions of anglers fish
for on the Atlantic coast.
So earlier this month, during the course of research for a
PubCon Vegas presentation, I typed the phrase, “how to
catch a cow fishing” and Google provided results related
to livestock, to cows.
Serpact Ltd. | AffiliateCon Sofia 2019
An Example?
How to Write Better
Optimized Texts
Serpact Ltd. | AffiliateCon Sofia 2019
Understand
What Your
Audience
Wants…
AnswerThePublic
Google Trends
Ahrefs
Semrush
Keywordtool.io
Ubersuggest.io
Moz Keyword Tool
Serpstat
SpuFu
Google Search Console
Use Keyword / Phrases Research Tools
Serpact Ltd. | AffiliateCon Sofia 2019
Group keywords around topics
Group keywords around intent
Group keywords around common classifiers - colors, w-words, sizes,
locations, brands etc.
Answer a question you want to target and provide the best answer
Answer Follow Up Questions
Serpact Ltd. | AffiliateCon Sofia 2019
Serpact Ltd. | AffiliateCon Sofia 2019
Be Careful With Your Website Structure - Be Topical
& Map Keywords
Serpact Ltd. | AffiliateCon Sofia 2019
Connect questions & answers in
your content
Connect your current and following question
Combine questions into a piece of content
Split the questions into sub-topics
Optimizing content for NLP should begin with
simple sentence structure and focused on
providing concise information for your audience.
You should always try to submit the exact
questions that people ask along with relevant
answers since people will be typing those
questions into search engines.
Serpact Ltd. | AffiliateCon Sofia 2019
Identify Units, Classifications, and
Adjectives
Within NLP SEO, words have meaning and therefore may have expected units,
classifications, or adjectives associated with them. NLP parsing will be on the lookout for
these elements when determining if the content contains the precise answer to a
question. Let’s look at two examples.
Example Query 1: “Safe Temperature for Chicken”
For this query, temperature has a unit of degrees in either Fahrenheit or Celsius
expressed as a numerical value. If a sentence does not include these elements, it does
not satisfy the question. A well-structured sentence should contain a number and the
word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the
answer is more accurate and specific, while also improving our localized targeting.
Be Clear With Your Answers
Reduce Dependency Hops
Reading a sentence and determining if a question is answered depends on
Google’s NLP parsers not getting hung up as they “crawl” through a
sentence. If a sentence’s structure is overly complicated, Google may fail to
create clear links between words or may require it to take too many hops to
build that relationship.
Don’t Beat Around the Bush
A common NLP problem is “beating around the bush” when it comes to
answering a question. It’s not that these answered are “wrong,” but they
don’t give Google a precise determination of the answer.
Serpact Ltd. | AffiliateCon Sofia 2019
Be Clear With Your Answers
Follow the Query Through
What is an Emergency Fund
How Much to Save
Types of Emergency Funds
How to Build an Emergency Fund
Answer yourself the question: “does this article answer all the subjects and questions a searcher
might have when they search?” Google can identify these follow through topics and questions by
looking at follow up searches and query refinement within search sessions.
Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them
content that eliminates the need for two to three additional searches.
If a user searches for “Emergency Fund,” they may have the following goals on their journey:
Disambiguate Entities
Isolate the entities when not used in a sentence. When an entity is used outside of a
sentence, try to isolate it in the text, and within an HTML tag where it appears, such as
headings, list items, or table cells.
Avoid grouping it with a price, year, category, parentheses, or any other data/text.
Simplify your content around entities you want Google to extract successfully.
There are two simple rules here:
Disambiguate Entities
When an entity can be confused, such as cities in multiple states, movies with the same
name, or films vs. books, you can disambiguate entities by using indicator words in the
same sentence.
For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For
the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted
entity is Portland, ME.
There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google
to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities
could have multiple identities.
Use indicator words to disambiguate entities.
Serpact Ltd. | AffiliateCon Sofia 2019
Text Formatting
Inverted Pyramid: Articles have a lede, body, and a tail. Content has
different meanings based on how far down the page it appears.
Headings: Headings define the content between it and the next heading.
Subtopics: Think of headings as sub-articles within the parent article.
Proximity: Proximity determines relationships.
Words/phrases in the same sentence are closely related.
Words/phrases in the same paragraph are related.
Words/phrases in different sections are distantly related.
Relationships: Subheadings have a parent – > child relationship. (A page with a
list of categories as H2s and products as H3s is a list of categories. A page with a
list of products as H2s and categories as H3s is a list of products.)
Serpact Ltd. | AffiliateCon Sofia 2019
Serpact Ltd. | AffiliateCon Sofia 2019
HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However,
heading tags are preferred.)Text also doesn’t need a heading tag to have a parent ->
child relationship. Its display and font formatting can visually dictate this without the
use of heading tags.
Lists: HTML ordered and unordered lists function as lists, which have a
meaning.Headings can also perform as lists.Headings with a number first can work
as an ordered list.
Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary”
phrases at the start of a paragraph can function as a list.
Tables inherently imply row/column relationships.Some formatting suggests
classification, like addresses and date formats.
Structure by Content Type: Some content types have expected data that
define them. Events have names, locations, and dates. Products have names,
brands, and prices.
Other Formats:
Serpact Ltd. | AffiliateCon Sofia 2019
Google NLP Tool
Webtexttool - Text Metrics
Semrush SEO Writing Assistant
Tools for content optimization:
Contacts
EMAIL ADDRESS
info@serpact.com
PHONE NUMBER
+359 (032) 260 096
WEBSITE
serpact.com
SERP.AC/FB
SERP.AC/TWITTER
SERP.AC/YOUTUBE

More Related Content

Similar to Semantic Search, Machine Learning & AI in Google's Latest Algorithms

IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET Journal
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...IRJET Journal
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
unleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdf
unleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdfunleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdf
unleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdfData & Analytics Magazin
 
SEO + NLP - Redefining The Computer & Human Relationship.pdf
SEO + NLP - Redefining The Computer & Human Relationship.pdfSEO + NLP - Redefining The Computer & Human Relationship.pdf
SEO + NLP - Redefining The Computer & Human Relationship.pdfLet's Get Visible
 
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...cscpconf
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search enginecsandit
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfNohaGhoweil
 
Optimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine OptimizationOptimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine Optimizationkomalkumari103
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems dannyijwest
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisMercy Livingstone
 

Similar to Semantic Search, Machine Learning & AI in Google's Latest Algorithms (20)

IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Language Modeling.docx
Language Modeling.docxLanguage Modeling.docx
Language Modeling.docx
 
unleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdf
unleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdfunleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdf
unleashing-the-power-of-semantic-search-2023-5-30-5-27-10.pdf
 
SEO + NLP - Redefining The Computer & Human Relationship.pdf
SEO + NLP - Redefining The Computer & Human Relationship.pdfSEO + NLP - Redefining The Computer & Human Relationship.pdf
SEO + NLP - Redefining The Computer & Human Relationship.pdf
 
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
Optimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine OptimizationOptimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine Optimization
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and Analysis
 

Recently uploaded

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Semantic Search, Machine Learning & AI in Google's Latest Algorithms

  • 1. Semantic Search, Machine Learning & AI in the latest Google algorithms
  • 2. More than 80 international certificates Over 40 lectures in Bulgaria More than 10 lectures abroad Over 15 years of professional experience More than 20 interviews for Bulgarian and international medias More than 20 interviews for Bulgarian and international medias The only Bulgarian SEO agency with a presentation at a Google event The only Bulgarian SEO agency officially accredited by Stone Temple Nominees at the Europe Search Awards 2019 ABOUT SERPACT
  • 4. WHAT IS MACHINE LEARNING AND WHY SEARCH ENGINES USE IT?
  • 5. FIND PATTERNS FOR URLS AND PAGE CONTENT Multiple outbound links to unrelated pages Excessive use of the same keywords Excessive use of synonyms Overoptimization of anchor texts Other similar variables. ANALYSIS OF SEARCH AND CLASSIFICATION PHRASES One of the best applications of machine learning algorithms is the classification of search phrases and, accordingly, index documents based on the user’s search intent. As we know, phrases can generally be – information, navigation and transactional. By analyzing click patterns and the type of content that users engage with (e.g., CTR by content type), a search engine can use machine learning to determine user intent.
  • 6. Identification of synonyms When you see search results that don’t include the keyword in a snippet, it’s likely that Google uses machine learning to identify synonyms.
  • 7. Better identification of word connections Google’s purpose for synonyms is to understand the context and meaning of a page. Creating content in a clear and consistent sense is much more important than spamming a page with keywords and synonyms.
  • 8. Identifying the similarities between the words in a search query
  • 9. Identifying the similarities between the words in a search query
  • 10. Identifying the similarities between the words in a search query
  • 11. CUSTOM ALERTS BASED ON SPECIFIC REQUESTS Machine learning algorithms can put more weight on variables in some queries than others. The search engine “learns” about the preferences of a particular user and can base its information on past queries to present the most interesting information. Overall, according to consumer research, personalized phrases through machine learning have increased the clickthrough rate (CTR) of results by about 10%.
  • 12. IDENTIFYING NEW ALERTS According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps identify patterns in queries but also helps the search engine identify possible new ranking signals. These alerts are being sought so that Google can continue to improve the quality of search results.
  • 13. WHAT IS NATURAL LANGUAGE PROCESSING - OR HOW SEARCH ENGINES UNDERSTAND OUR CONTENT?
  • 14. Tasks of NLP Tokenization, Lemmatization, Stemming Sentence boundary detection Part-of-speech tagging Syntax parsing - Dependency parsing & Constituency parsing Semantic role labeling Semantic dependency parsing Word sense disambiguation/induction Named-entity recognition/classification Entity linking Temporal expression recognition/normalization Co-reference resolution Information extraction Terminology extraction Topic modeling
  • 15. Attributional similarity (word similarity) Relational similarity Phrase similarity Sentence similarity Paraphrase identification Textual entailment Natural language generation Speech recognition Speech synthesis Ontology population Question answering Machine translation Text coherence Fake news detection Tasks of NLP
  • 16. Serpact Ltd. | AffiliateCon Sofia 2019 How Search Engines Like Google Process The Content Today?
  • 17.
  • 18. Text Pre-Processing Noise Removal Lexicon Normalization Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word. Lemmatization: Lemmatization, on the other hand, is an organized & step by step procedure of obtaining the root form of the word, it makes use of vocabulary and morphological analysis (word structure and grammar relations). Object Standardization - acronyms, hashtags with attached words, and colloquial slangs Normalization and Lemmatization: POS tags are the basis of lemmatization process for converting a word to its base form (lemma). Efficient stopword removal : P OS tags are also useful in efficient removal of stopwords.
  • 19. Serpact Ltd. | AffiliateCon Sofia 2019 Syntactic Parsing
  • 20. Serpact Ltd. | AffiliateCon Sofia 2019 Syntactic Parsing Dependency Trees – Sentences are composed of some words sewed together. The relationship among the words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items (words). Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function of a word in the sentence. Word sense disambiguation: Some language words have multiple meanings according to their usage. For example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the flight”
  • 21. Serpact Ltd. | AffiliateCon Sofia 2019 Entity Extraction Named Entity Recognition (NER) Noun phrase identification Phrase classification Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.
  • 22. Serpact Ltd. | AffiliateCon Sofia 2019 Topic Modelling - Latent Dirichlet Allocation (LDA) Topic modeling is a process of automatically identifying the topics present in a text corpus, it derives the hidden patterns among the words in the corpus in an unsupervised manner. Topics are defined as “a repeating pattern of co- occurring terms in a corpus”. A good topic model results in – “health”, “doctor”, “patient”, “hospital” for a topic – Healthcare, and “farm”, “crops”, “wheat” for a topic – “Farming”.
  • 23. Bag of Words Is a commonly used model that allows you to count all words in a piece of text. Basically it creates an occurrence matrix for the sentence or document, disregarding grammar and word order. These word frequencies or occurrences are then used as features for training a classifier.
  • 24. Serpact Ltd. | AffiliateCon Sofia 2019 N-Grams as Features A combination of N words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words (Unigrams) as features. Also, bigrams (N = 2) are considered as the most important features of all the others.
  • 25. Statistical Features Term Frequency – Inverse Document Frequency (TF – IDF) Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D” Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total documents available in the corpus and number of documents containing the term T. Count / Density / Readability Features - Count or Density based features can also be used in models and analysis. These features might seem trivial but shows a great impact in learning models. Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry specific word counts.
  • 26. Serpact Ltd. | AffiliateCon Sofia 2019 Word Embedding (text vectors) Word embedding is the modern way of representing words as vectors. The aim of word embedding is to redefine the high dimensional word features into low dimensional feature vectors by preserving the contextual similarity in the corpus. They are widely used in deep learning models such as Convolutional Neural Networks and Recurrent Neural Networks.
  • 27. Serpact Ltd. | AffiliateCon Sofia 2019 Final Result Text Classification - Email Spam Identification, topic classification of news, sentiment classification and organization of web pages by search engines. Text Matching / Similarity Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name, location name etc) and produces a character string that identifies a set of words that are (roughly) phonetically similar. Flexible String Matching – A complete text matching system includes different algorithms pipelined together to compute variety of text variations. (exact string matching, lemmatized matching, and compact matching (takes care of spaces, punctuation’s, slangs etc). Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can also be applied in order to measure vectorized similarity. Following code converts a text to vectors (using term frequency) and applies cosine similarity to provide closeness among two text. Coreference Resolution- it is a process of finding relational links among the words (or phrases) within the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“
  • 28. Serpact Ltd. | AffiliateCon Sofia 2019 What is GOOGLE BERT?
  • 29. Serpact Ltd. | AffiliateCon Sofia 2019 According to Google:
  • 30. The phrase was “how to catch a cow fishing? ”In New England, the word “cow” in the context of fishing means a large striped bass. A striped bass is a popular saltwater game fish that millions of anglers fish for on the Atlantic coast. So earlier this month, during the course of research for a PubCon Vegas presentation, I typed the phrase, “how to catch a cow fishing” and Google provided results related to livestock, to cows. Serpact Ltd. | AffiliateCon Sofia 2019 An Example?
  • 31. How to Write Better Optimized Texts
  • 32. Serpact Ltd. | AffiliateCon Sofia 2019 Understand What Your Audience Wants…
  • 33. AnswerThePublic Google Trends Ahrefs Semrush Keywordtool.io Ubersuggest.io Moz Keyword Tool Serpstat SpuFu Google Search Console Use Keyword / Phrases Research Tools
  • 34. Serpact Ltd. | AffiliateCon Sofia 2019
  • 35. Group keywords around topics Group keywords around intent Group keywords around common classifiers - colors, w-words, sizes, locations, brands etc.
  • 36. Answer a question you want to target and provide the best answer Answer Follow Up Questions Serpact Ltd. | AffiliateCon Sofia 2019
  • 37. Serpact Ltd. | AffiliateCon Sofia 2019 Be Careful With Your Website Structure - Be Topical & Map Keywords
  • 38. Serpact Ltd. | AffiliateCon Sofia 2019 Connect questions & answers in your content Connect your current and following question Combine questions into a piece of content Split the questions into sub-topics Optimizing content for NLP should begin with simple sentence structure and focused on providing concise information for your audience. You should always try to submit the exact questions that people ask along with relevant answers since people will be typing those questions into search engines.
  • 39. Serpact Ltd. | AffiliateCon Sofia 2019 Identify Units, Classifications, and Adjectives Within NLP SEO, words have meaning and therefore may have expected units, classifications, or adjectives associated with them. NLP parsing will be on the lookout for these elements when determining if the content contains the precise answer to a question. Let’s look at two examples. Example Query 1: “Safe Temperature for Chicken” For this query, temperature has a unit of degrees in either Fahrenheit or Celsius expressed as a numerical value. If a sentence does not include these elements, it does not satisfy the question. A well-structured sentence should contain a number and the word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the answer is more accurate and specific, while also improving our localized targeting.
  • 40. Be Clear With Your Answers Reduce Dependency Hops Reading a sentence and determining if a question is answered depends on Google’s NLP parsers not getting hung up as they “crawl” through a sentence. If a sentence’s structure is overly complicated, Google may fail to create clear links between words or may require it to take too many hops to build that relationship. Don’t Beat Around the Bush A common NLP problem is “beating around the bush” when it comes to answering a question. It’s not that these answered are “wrong,” but they don’t give Google a precise determination of the answer.
  • 41. Serpact Ltd. | AffiliateCon Sofia 2019 Be Clear With Your Answers
  • 42. Follow the Query Through What is an Emergency Fund How Much to Save Types of Emergency Funds How to Build an Emergency Fund Answer yourself the question: “does this article answer all the subjects and questions a searcher might have when they search?” Google can identify these follow through topics and questions by looking at follow up searches and query refinement within search sessions. Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them content that eliminates the need for two to three additional searches. If a user searches for “Emergency Fund,” they may have the following goals on their journey:
  • 43. Disambiguate Entities Isolate the entities when not used in a sentence. When an entity is used outside of a sentence, try to isolate it in the text, and within an HTML tag where it appears, such as headings, list items, or table cells. Avoid grouping it with a price, year, category, parentheses, or any other data/text. Simplify your content around entities you want Google to extract successfully. There are two simple rules here:
  • 44. Disambiguate Entities When an entity can be confused, such as cities in multiple states, movies with the same name, or films vs. books, you can disambiguate entities by using indicator words in the same sentence. For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted entity is Portland, ME. There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities could have multiple identities. Use indicator words to disambiguate entities.
  • 45. Serpact Ltd. | AffiliateCon Sofia 2019 Text Formatting Inverted Pyramid: Articles have a lede, body, and a tail. Content has different meanings based on how far down the page it appears. Headings: Headings define the content between it and the next heading. Subtopics: Think of headings as sub-articles within the parent article. Proximity: Proximity determines relationships. Words/phrases in the same sentence are closely related. Words/phrases in the same paragraph are related. Words/phrases in different sections are distantly related. Relationships: Subheadings have a parent – > child relationship. (A page with a list of categories as H2s and products as H3s is a list of categories. A page with a list of products as H2s and categories as H3s is a list of products.)
  • 46. Serpact Ltd. | AffiliateCon Sofia 2019
  • 47. Serpact Ltd. | AffiliateCon Sofia 2019 HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However, heading tags are preferred.)Text also doesn’t need a heading tag to have a parent -> child relationship. Its display and font formatting can visually dictate this without the use of heading tags. Lists: HTML ordered and unordered lists function as lists, which have a meaning.Headings can also perform as lists.Headings with a number first can work as an ordered list. Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary” phrases at the start of a paragraph can function as a list. Tables inherently imply row/column relationships.Some formatting suggests classification, like addresses and date formats. Structure by Content Type: Some content types have expected data that define them. Events have names, locations, and dates. Products have names, brands, and prices. Other Formats:
  • 48. Serpact Ltd. | AffiliateCon Sofia 2019 Google NLP Tool Webtexttool - Text Metrics Semrush SEO Writing Assistant Tools for content optimization:
  • 49.
  • 50.
  • 51.
  • 52. Contacts EMAIL ADDRESS info@serpact.com PHONE NUMBER +359 (032) 260 096 WEBSITE serpact.com SERP.AC/FB SERP.AC/TWITTER SERP.AC/YOUTUBE