Semantic Search_ NLP_ ML.pdf

Semantic Search,
Machine Learning & AI in
the latest Google
algorithms
More than 80 international certificates
Over 40 lectures in Bulgaria
More than 10 lectures abroad
Over 15 years of professional experience
More than 20 interviews for Bulgarian and international medias
More than 20 interviews for Bulgarian and international medias
The only Bulgarian SEO agency with a presentation at a Google event
The only Bulgarian SEO agency officially accredited by Stone Temple
Nominees at the Europe Search Awards 2019
ABOUT SERPACT
WHAT IS SEMANTIC SEARCH?
WHAT IS MACHINE LEARNING
AND WHY SEARCH ENGINES
USE IT?
FIND PATTERNS FOR URLS AND
PAGE CONTENT
Multiple outbound links to
unrelated pages
Excessive use of the same keywords
Excessive use of synonyms
Overoptimization of anchor texts
Other similar variables.
ANALYSIS OF SEARCH AND
CLASSIFICATION PHRASES
One of the best applications of machine learning
algorithms is the classification of search phrases
and, accordingly, index documents based on the
user’s search intent.
As we know, phrases can generally be –
information, navigation and transactional.
By analyzing click patterns and the type of
content that users engage with (e.g., CTR by
content type), a search engine can use machine
learning to determine user intent.
Identification of synonyms
When you see search results
that don’t include the keyword
in a snippet, it’s likely that
Google uses machine learning to
identify synonyms.
Better identification of word connections
Google’s purpose for synonyms is to understand the context and meaning of a page.
Creating content in a clear and consistent sense is much more important than
spamming a page with keywords and synonyms.
Identifying the similarities between the words in a
search query
Identifying the similarities between the words in a
search query
Identifying the similarities between the words in a
search query
CUSTOM ALERTS BASED ON
SPECIFIC REQUESTS
Machine learning algorithms can put more weight on variables in some queries than
others. The search engine “learns” about the preferences of a particular user and can
base its information on past queries to present the most interesting information.
Overall, according to consumer research, personalized phrases through machine
learning have increased the clickthrough rate (CTR) of results by about 10%.
IDENTIFYING NEW ALERTS
According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps
identify patterns in queries but also helps the search engine identify possible new
ranking signals. These alerts are being sought so that Google can continue to
improve the quality of search results.
WHAT IS NATURAL LANGUAGE
PROCESSING -
OR HOW SEARCH ENGINES UNDERSTAND
OUR CONTENT?
Tasks of NLP
Tokenization, Lemmatization, Stemming
Sentence boundary detection
Part-of-speech tagging
Syntax parsing - Dependency parsing & Constituency parsing
Semantic role labeling
Semantic dependency parsing
Word sense disambiguation/induction
Named-entity recognition/classification
Entity linking
Temporal expression recognition/normalization
Co-reference resolution
Information extraction
Terminology extraction
Topic modeling
Attributional similarity (word similarity)
Relational similarity
Phrase similarity
Sentence similarity
Paraphrase identification
Textual entailment
Natural language generation
Speech recognition
Speech synthesis
Ontology population
Question answering
Machine translation
Text coherence
Fake news detection
Tasks of NLP
Serpact Ltd. | AffiliateCon Sofia 2019
How Search Engines Like
Google Process The Content
Today?
Semantic Search_ NLP_ ML.pdf
Text Pre-Processing
Noise Removal
Lexicon Normalization
Stemming: Stemming is a rudimentary rule-based
process of stripping the suffixes (“ing”, “ly”, “es”, “s”
etc) from a word.
Lemmatization: Lemmatization, on the other hand, is
an organized & step by step procedure of obtaining
the root form of the word, it makes use of
vocabulary and morphological analysis (word
structure and grammar relations).
Object Standardization - acronyms, hashtags with
attached words, and colloquial slangs
Normalization and Lemmatization: POS tags are the
basis of lemmatization process for converting a word
to its base form (lemma).
Efficient stopword removal : P OS tags are also useful
in efficient removal of stopwords.
Serpact Ltd. | AffiliateCon Sofia 2019
Syntactic Parsing
Serpact Ltd. | AffiliateCon Sofia 2019
Syntactic Parsing
Dependency Trees – Sentences are composed of some words sewed together. The relationship among the
words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of
syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items
(words).
Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with
a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function
of a word in the sentence.
Word sense disambiguation: Some language words have multiple meanings according to their usage. For
example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the
flight”
Serpact Ltd. | AffiliateCon Sofia 2019
Entity Extraction
Named Entity Recognition (NER)
Noun phrase identification
Phrase classification
Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on
top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge
graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.
Serpact Ltd. | AffiliateCon Sofia 2019
Topic Modelling - Latent Dirichlet
Allocation (LDA)
Topic modeling is a process of automatically
identifying the topics present in a text corpus, it
derives the hidden patterns among the words in the
corpus in an unsupervised manner.
Topics are defined as “a repeating pattern of co-
occurring terms in a corpus”. A good topic model
results in – “health”, “doctor”, “patient”, “hospital” for a
topic – Healthcare, and “farm”, “crops”, “wheat” for a
topic – “Farming”.
Bag of Words
Is a commonly used model that allows you to count all words in a piece of text.
Basically it creates an occurrence matrix for the sentence or document,
disregarding grammar and word order. These word frequencies or occurrences
are then used as features for training a classifier.
Serpact Ltd. | AffiliateCon Sofia 2019
N-Grams as Features
A combination of N words together are called N-Grams. N grams (N > 1) are
generally more informative as compared to words (Unigrams) as features. Also,
bigrams (N = 2) are considered as the most important features of all the others.
Statistical Features
Term Frequency – Inverse Document Frequency (TF – IDF)
Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D”
Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total
documents available in the corpus and number of documents containing the term T.
Count / Density / Readability Features - Count or Density based features can also be used in
models and analysis. These features might seem trivial but shows a great impact in learning models.
Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry
specific word counts.
Serpact Ltd. | AffiliateCon Sofia 2019
Word Embedding (text vectors)
Word embedding is the modern way of representing words as vectors. The aim of word
embedding is to redefine the high dimensional word features into low dimensional feature
vectors by preserving the contextual similarity in the corpus.
They are widely used in deep learning models such as Convolutional Neural Networks and
Recurrent Neural Networks.
Serpact Ltd. | AffiliateCon Sofia 2019
Final Result
Text Classification - Email Spam Identification, topic classification of news, sentiment classification
and organization of web pages by search engines.
Text Matching / Similarity
Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name,
location name etc) and produces a character string that identifies a set of words that are (roughly)
phonetically similar.
Flexible String Matching – A complete text matching system includes different algorithms pipelined
together to compute variety of text variations. (exact string matching, lemmatized matching, and
compact matching (takes care of spaces, punctuation’s, slangs etc).
Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can
also be applied in order to measure vectorized similarity. Following code converts a text to vectors
(using term frequency) and applies cosine similarity to provide closeness among two text.
Coreference Resolution- it is a process of finding relational links among the words (or phrases) within
the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“
Serpact Ltd. | AffiliateCon Sofia 2019
What is GOOGLE BERT?
Serpact Ltd. | AffiliateCon Sofia 2019
According to Google:
The phrase was “how to catch a cow fishing?
”In New England, the word “cow” in the context of
fishing means a large striped bass. A striped bass is a
popular saltwater game fish that millions of anglers fish
for on the Atlantic coast.
So earlier this month, during the course of research for a
PubCon Vegas presentation, I typed the phrase, “how to
catch a cow fishing” and Google provided results related
to livestock, to cows.
Serpact Ltd. | AffiliateCon Sofia 2019
An Example?
How to Write Better
Optimized Texts
Serpact Ltd. | AffiliateCon Sofia 2019
Understand
What Your
Audience
Wants…
AnswerThePublic
Google Trends
Ahrefs
Semrush
Keywordtool.io
Ubersuggest.io
Moz Keyword Tool
Serpstat
SpuFu
Google Search Console
Use Keyword / Phrases Research Tools
Serpact Ltd. | AffiliateCon Sofia 2019
Group keywords around topics
Group keywords around intent
Group keywords around common classifiers - colors, w-words, sizes,
locations, brands etc.
Answer a question you want to target and provide the best answer
Answer Follow Up Questions
Serpact Ltd. | AffiliateCon Sofia 2019
Serpact Ltd. | AffiliateCon Sofia 2019
Be Careful With Your Website Structure - Be Topical
& Map Keywords
Serpact Ltd. | AffiliateCon Sofia 2019
Connect questions & answers in
your content
Connect your current and following question
Combine questions into a piece of content
Split the questions into sub-topics
Optimizing content for NLP should begin with
simple sentence structure and focused on
providing concise information for your audience.
You should always try to submit the exact
questions that people ask along with relevant
answers since people will be typing those
questions into search engines.
Serpact Ltd. | AffiliateCon Sofia 2019
Identify Units, Classifications, and
Adjectives
Within NLP SEO, words have meaning and therefore may have expected units,
classifications, or adjectives associated with them. NLP parsing will be on the lookout for
these elements when determining if the content contains the precise answer to a
question. Let’s look at two examples.
Example Query 1: “Safe Temperature for Chicken”
For this query, temperature has a unit of degrees in either Fahrenheit or Celsius
expressed as a numerical value. If a sentence does not include these elements, it does
not satisfy the question. A well-structured sentence should contain a number and the
word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the
answer is more accurate and specific, while also improving our localized targeting.
Be Clear With Your Answers
Reduce Dependency Hops
Reading a sentence and determining if a question is answered depends on
Google’s NLP parsers not getting hung up as they “crawl” through a
sentence. If a sentence’s structure is overly complicated, Google may fail to
create clear links between words or may require it to take too many hops to
build that relationship.
Don’t Beat Around the Bush
A common NLP problem is “beating around the bush” when it comes to
answering a question. It’s not that these answered are “wrong,” but they
don’t give Google a precise determination of the answer.
Serpact Ltd. | AffiliateCon Sofia 2019
Be Clear With Your Answers
Follow the Query Through
What is an Emergency Fund
How Much to Save
Types of Emergency Funds
How to Build an Emergency Fund
Answer yourself the question: “does this article answer all the subjects and questions a searcher
might have when they search?” Google can identify these follow through topics and questions by
looking at follow up searches and query refinement within search sessions.
Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them
content that eliminates the need for two to three additional searches.
If a user searches for “Emergency Fund,” they may have the following goals on their journey:
Disambiguate Entities
Isolate the entities when not used in a sentence. When an entity is used outside of a
sentence, try to isolate it in the text, and within an HTML tag where it appears, such as
headings, list items, or table cells.
Avoid grouping it with a price, year, category, parentheses, or any other data/text.
Simplify your content around entities you want Google to extract successfully.
There are two simple rules here:
Disambiguate Entities
When an entity can be confused, such as cities in multiple states, movies with the same
name, or films vs. books, you can disambiguate entities by using indicator words in the
same sentence.
For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For
the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted
entity is Portland, ME.
There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google
to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities
could have multiple identities.
Use indicator words to disambiguate entities.
Serpact Ltd. | AffiliateCon Sofia 2019
Text Formatting
Inverted Pyramid: Articles have a lede, body, and a tail. Content has
different meanings based on how far down the page it appears.
Headings: Headings define the content between it and the next heading.
Subtopics: Think of headings as sub-articles within the parent article.
Proximity: Proximity determines relationships.
Words/phrases in the same sentence are closely related.
Words/phrases in the same paragraph are related.
Words/phrases in different sections are distantly related.
Relationships: Subheadings have a parent – > child relationship. (A page with a
list of categories as H2s and products as H3s is a list of categories. A page with a
list of products as H2s and categories as H3s is a list of products.)
Serpact Ltd. | AffiliateCon Sofia 2019
Serpact Ltd. | AffiliateCon Sofia 2019
HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However,
heading tags are preferred.)Text also doesn’t need a heading tag to have a parent ->
child relationship. Its display and font formatting can visually dictate this without the
use of heading tags.
Lists: HTML ordered and unordered lists function as lists, which have a
meaning.Headings can also perform as lists.Headings with a number first can work
as an ordered list.
Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary”
phrases at the start of a paragraph can function as a list.
Tables inherently imply row/column relationships.Some formatting suggests
classification, like addresses and date formats.
Structure by Content Type: Some content types have expected data that
define them. Events have names, locations, and dates. Products have names,
brands, and prices.
Other Formats:
Serpact Ltd. | AffiliateCon Sofia 2019
Google NLP Tool
Webtexttool - Text Metrics
Semrush SEO Writing Assistant
Tools for content optimization:
Semantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdf
Contacts
EMAIL ADDRESS
info@serpact.com
PHONE NUMBER
+359 (032) 260 096
WEBSITE
serpact.com
SERP.AC/FB
SERP.AC/TWITTER
SERP.AC/YOUTUBE
1 of 52

Recommended

Essential Elements of Excellent Multilingual Search by
Essential Elements of Excellent Multilingual SearchEssential Elements of Excellent Multilingual Search
Essential Elements of Excellent Multilingual Searchandrew_paulsen
448 views13 slides
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM by
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMITC Infotech
135 views5 slides
Introduction to Semantic Technology for SharePoint Administrators by
Introduction to Semantic Technology for SharePoint AdministratorsIntroduction to Semantic Technology for SharePoint Administrators
Introduction to Semantic Technology for SharePoint AdministratorsBradley Bennet
559 views32 slides
Aq35241246 by
Aq35241246Aq35241246
Aq35241246IJERA Editor
598 views6 slides
Metaphic or the art of looking another way. by
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Suresh Manian
1.1K views17 slides
Transform unstructured e&p information by
Transform unstructured e&p informationTransform unstructured e&p information
Transform unstructured e&p informationStig-Arne Kristoffersen
194 views16 slides

More Related Content

Similar to Semantic Search_ NLP_ ML.pdf

IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi... by
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET Journal
10 views6 slides
AI BASED PLAGIARISM CHECKER by
AI BASED PLAGIARISM CHECKERAI BASED PLAGIARISM CHECKER
AI BASED PLAGIARISM CHECKERAndrew Molina
5 views6 slides
NLP and its applications by
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
939 views6 slides
You Don't Know SEO by
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
120.9K views214 slides
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE... by
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
88 views11 slides
IRJET- Short-Text Semantic Similarity using Glove Word Embedding by
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
28 views6 slides

Similar to Semantic Search_ NLP_ ML.pdf(20)

IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi... by IRJET Journal
IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...IRJET-  	  Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET- Opinion Targets and Opinion Words Extraction for Online Reviews wi...
IRJET Journal10 views
NLP and its applications by Utphala P
NLP and its applicationsNLP and its applications
NLP and its applications
Utphala P939 views
You Don't Know SEO by Michael King
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
Michael King120.9K views
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE... by cscpconf
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
cscpconf88 views
IRJET- Short-Text Semantic Similarity using Glove Word Embedding by IRJET Journal
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal28 views
SEO + NLP - Redefining The Computer & Human Relationship.pdf by Let's Get Visible
SEO + NLP - Redefining The Computer & Human Relationship.pdfSEO + NLP - Redefining The Computer & Human Relationship.pdf
SEO + NLP - Redefining The Computer & Human Relationship.pdf
Teaching machines about a subject domain by Paul Cleverley
Teaching machines about a subject domainTeaching machines about a subject domain
Teaching machines about a subject domain
Paul Cleverley616 views
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,... by cscpconf
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
cscpconf35 views
Computing semantic similarity measure between words using web search engine by csandit
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit1.5K views
Ijarcet vol-2-issue-7-2252-2257 by Editor IJARCET
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET153 views
Ijarcet vol-2-issue-7-2252-2257 by Editor IJARCET
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET233 views
EasyChair-Preprint-7375.pdf by NohaGhoweil
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil46 views
Optimized Technique for Academic Search engine Optimization by komalkumari103
Optimized Technique for Academic Search engine OptimizationOptimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine Optimization
komalkumari10312 views
Semantic Grounding Strategies for Tagbased Recommender Systems by dannyijwest
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems
dannyijwest16 views

Recently uploaded

DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...Deltares
9 views34 slides
Consulting for Data Monetization Maximizing the Profit Potential of Your Data... by
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Flexsin
15 views10 slides
Software testing company in India.pptx by
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptxSakshiPatel82
7 views9 slides
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove... by
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...Deltares
17 views17 slides
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut... by
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...HCLSoftware
6 views2 slides
Cycleops - Automate deployments on top of bare metal.pptx by
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptxThanassis Parathyras
30 views12 slides

Recently uploaded(20)

DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by Deltares
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
Deltares9 views
Consulting for Data Monetization Maximizing the Profit Potential of Your Data... by Flexsin
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Flexsin 15 views
Software testing company in India.pptx by SakshiPatel82
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptx
SakshiPatel827 views
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove... by Deltares
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
Deltares17 views
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut... by HCLSoftware
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
HCLSoftware6 views
Cycleops - Automate deployments on top of bare metal.pptx by Thanassis Parathyras
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptx
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares11 views
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida by Deltares
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - PridaDSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida
Deltares18 views
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
Deltares17 views
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ... by marksimpsongw
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw76 views
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... by Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares9 views
Tridens DevOps by Tridens
Tridens DevOpsTridens DevOps
Tridens DevOps
Tridens9 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
Roadmap y Novedades de producto by Neo4j
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de producto
Neo4j50 views
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea... by Safe Software
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Safe Software412 views
A first look at MariaDB 11.x features and ideas on how to use them by Federico Razzoli
A first look at MariaDB 11.x features and ideas on how to use themA first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use them
Federico Razzoli45 views
DSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - Afternoon by Deltares
DSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - AfternoonDSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - Afternoon
DSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - Afternoon
Deltares13 views
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t... by Deltares
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
Deltares9 views

Semantic Search_ NLP_ ML.pdf

  • 1. Semantic Search, Machine Learning & AI in the latest Google algorithms
  • 2. More than 80 international certificates Over 40 lectures in Bulgaria More than 10 lectures abroad Over 15 years of professional experience More than 20 interviews for Bulgarian and international medias More than 20 interviews for Bulgarian and international medias The only Bulgarian SEO agency with a presentation at a Google event The only Bulgarian SEO agency officially accredited by Stone Temple Nominees at the Europe Search Awards 2019 ABOUT SERPACT
  • 4. WHAT IS MACHINE LEARNING AND WHY SEARCH ENGINES USE IT?
  • 5. FIND PATTERNS FOR URLS AND PAGE CONTENT Multiple outbound links to unrelated pages Excessive use of the same keywords Excessive use of synonyms Overoptimization of anchor texts Other similar variables. ANALYSIS OF SEARCH AND CLASSIFICATION PHRASES One of the best applications of machine learning algorithms is the classification of search phrases and, accordingly, index documents based on the user’s search intent. As we know, phrases can generally be – information, navigation and transactional. By analyzing click patterns and the type of content that users engage with (e.g., CTR by content type), a search engine can use machine learning to determine user intent.
  • 6. Identification of synonyms When you see search results that don’t include the keyword in a snippet, it’s likely that Google uses machine learning to identify synonyms.
  • 7. Better identification of word connections Google’s purpose for synonyms is to understand the context and meaning of a page. Creating content in a clear and consistent sense is much more important than spamming a page with keywords and synonyms.
  • 8. Identifying the similarities between the words in a search query
  • 9. Identifying the similarities between the words in a search query
  • 10. Identifying the similarities between the words in a search query
  • 11. CUSTOM ALERTS BASED ON SPECIFIC REQUESTS Machine learning algorithms can put more weight on variables in some queries than others. The search engine “learns” about the preferences of a particular user and can base its information on past queries to present the most interesting information. Overall, according to consumer research, personalized phrases through machine learning have increased the clickthrough rate (CTR) of results by about 10%.
  • 12. IDENTIFYING NEW ALERTS According to a 2016 podcast by Gary Illyes of Google, RankBrain not only helps identify patterns in queries but also helps the search engine identify possible new ranking signals. These alerts are being sought so that Google can continue to improve the quality of search results.
  • 13. WHAT IS NATURAL LANGUAGE PROCESSING - OR HOW SEARCH ENGINES UNDERSTAND OUR CONTENT?
  • 14. Tasks of NLP Tokenization, Lemmatization, Stemming Sentence boundary detection Part-of-speech tagging Syntax parsing - Dependency parsing & Constituency parsing Semantic role labeling Semantic dependency parsing Word sense disambiguation/induction Named-entity recognition/classification Entity linking Temporal expression recognition/normalization Co-reference resolution Information extraction Terminology extraction Topic modeling
  • 15. Attributional similarity (word similarity) Relational similarity Phrase similarity Sentence similarity Paraphrase identification Textual entailment Natural language generation Speech recognition Speech synthesis Ontology population Question answering Machine translation Text coherence Fake news detection Tasks of NLP
  • 16. Serpact Ltd. | AffiliateCon Sofia 2019 How Search Engines Like Google Process The Content Today?
  • 18. Text Pre-Processing Noise Removal Lexicon Normalization Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word. Lemmatization: Lemmatization, on the other hand, is an organized & step by step procedure of obtaining the root form of the word, it makes use of vocabulary and morphological analysis (word structure and grammar relations). Object Standardization - acronyms, hashtags with attached words, and colloquial slangs Normalization and Lemmatization: POS tags are the basis of lemmatization process for converting a word to its base form (lemma). Efficient stopword removal : P OS tags are also useful in efficient removal of stopwords.
  • 19. Serpact Ltd. | AffiliateCon Sofia 2019 Syntactic Parsing
  • 20. Serpact Ltd. | AffiliateCon Sofia 2019 Syntactic Parsing Dependency Trees – Sentences are composed of some words sewed together. The relationship among the words in a sentence is determined by the basic dependency grammar. Dependency grammar is a class of syntactic text analysis that deals with (labeled) asymmetrical binary relations between two lexical items (words). Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function of a word in the sentence. Word sense disambiguation: Some language words have multiple meanings according to their usage. For example, in the two sentences below:I. “Please book my flight for Delhi”II. “I am going to read this book in the flight”
  • 21. Serpact Ltd. | AffiliateCon Sofia 2019 Entity Extraction Named Entity Recognition (NER) Noun phrase identification Phrase classification Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge graphs are – Google Knowledge Graph, IBM Watson and Wikipedia.
  • 22. Serpact Ltd. | AffiliateCon Sofia 2019 Topic Modelling - Latent Dirichlet Allocation (LDA) Topic modeling is a process of automatically identifying the topics present in a text corpus, it derives the hidden patterns among the words in the corpus in an unsupervised manner. Topics are defined as “a repeating pattern of co- occurring terms in a corpus”. A good topic model results in – “health”, “doctor”, “patient”, “hospital” for a topic – Healthcare, and “farm”, “crops”, “wheat” for a topic – “Farming”.
  • 23. Bag of Words Is a commonly used model that allows you to count all words in a piece of text. Basically it creates an occurrence matrix for the sentence or document, disregarding grammar and word order. These word frequencies or occurrences are then used as features for training a classifier.
  • 24. Serpact Ltd. | AffiliateCon Sofia 2019 N-Grams as Features A combination of N words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words (Unigrams) as features. Also, bigrams (N = 2) are considered as the most important features of all the others.
  • 25. Statistical Features Term Frequency – Inverse Document Frequency (TF – IDF) Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D” Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total documents available in the corpus and number of documents containing the term T. Count / Density / Readability Features - Count or Density based features can also be used in models and analysis. These features might seem trivial but shows a great impact in learning models. Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry specific word counts.
  • 26. Serpact Ltd. | AffiliateCon Sofia 2019 Word Embedding (text vectors) Word embedding is the modern way of representing words as vectors. The aim of word embedding is to redefine the high dimensional word features into low dimensional feature vectors by preserving the contextual similarity in the corpus. They are widely used in deep learning models such as Convolutional Neural Networks and Recurrent Neural Networks.
  • 27. Serpact Ltd. | AffiliateCon Sofia 2019 Final Result Text Classification - Email Spam Identification, topic classification of news, sentiment classification and organization of web pages by search engines. Text Matching / Similarity Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name, location name etc) and produces a character string that identifies a set of words that are (roughly) phonetically similar. Flexible String Matching – A complete text matching system includes different algorithms pipelined together to compute variety of text variations. (exact string matching, lemmatized matching, and compact matching (takes care of spaces, punctuation’s, slangs etc). Cosine Similarity – When the text is represented as vector notation, a general cosine similarity can also be applied in order to measure vectorized similarity. Following code converts a text to vectors (using term frequency) and applies cosine similarity to provide closeness among two text. Coreference Resolution- it is a process of finding relational links among the words (or phrases) within the sentences. Donald went to John’s office to see the new table. He looked at it for an hour.“
  • 28. Serpact Ltd. | AffiliateCon Sofia 2019 What is GOOGLE BERT?
  • 29. Serpact Ltd. | AffiliateCon Sofia 2019 According to Google:
  • 30. The phrase was “how to catch a cow fishing? ”In New England, the word “cow” in the context of fishing means a large striped bass. A striped bass is a popular saltwater game fish that millions of anglers fish for on the Atlantic coast. So earlier this month, during the course of research for a PubCon Vegas presentation, I typed the phrase, “how to catch a cow fishing” and Google provided results related to livestock, to cows. Serpact Ltd. | AffiliateCon Sofia 2019 An Example?
  • 31. How to Write Better Optimized Texts
  • 32. Serpact Ltd. | AffiliateCon Sofia 2019 Understand What Your Audience Wants…
  • 33. AnswerThePublic Google Trends Ahrefs Semrush Keywordtool.io Ubersuggest.io Moz Keyword Tool Serpstat SpuFu Google Search Console Use Keyword / Phrases Research Tools
  • 34. Serpact Ltd. | AffiliateCon Sofia 2019
  • 35. Group keywords around topics Group keywords around intent Group keywords around common classifiers - colors, w-words, sizes, locations, brands etc.
  • 36. Answer a question you want to target and provide the best answer Answer Follow Up Questions Serpact Ltd. | AffiliateCon Sofia 2019
  • 37. Serpact Ltd. | AffiliateCon Sofia 2019 Be Careful With Your Website Structure - Be Topical & Map Keywords
  • 38. Serpact Ltd. | AffiliateCon Sofia 2019 Connect questions & answers in your content Connect your current and following question Combine questions into a piece of content Split the questions into sub-topics Optimizing content for NLP should begin with simple sentence structure and focused on providing concise information for your audience. You should always try to submit the exact questions that people ask along with relevant answers since people will be typing those questions into search engines.
  • 39. Serpact Ltd. | AffiliateCon Sofia 2019 Identify Units, Classifications, and Adjectives Within NLP SEO, words have meaning and therefore may have expected units, classifications, or adjectives associated with them. NLP parsing will be on the lookout for these elements when determining if the content contains the precise answer to a question. Let’s look at two examples. Example Query 1: “Safe Temperature for Chicken” For this query, temperature has a unit of degrees in either Fahrenheit or Celsius expressed as a numerical value. If a sentence does not include these elements, it does not satisfy the question. A well-structured sentence should contain a number and the word degree or the degree symbol. If our sentence clarifies Fahrenheit or Celsius, the answer is more accurate and specific, while also improving our localized targeting.
  • 40. Be Clear With Your Answers Reduce Dependency Hops Reading a sentence and determining if a question is answered depends on Google’s NLP parsers not getting hung up as they “crawl” through a sentence. If a sentence’s structure is overly complicated, Google may fail to create clear links between words or may require it to take too many hops to build that relationship. Don’t Beat Around the Bush A common NLP problem is “beating around the bush” when it comes to answering a question. It’s not that these answered are “wrong,” but they don’t give Google a precise determination of the answer.
  • 41. Serpact Ltd. | AffiliateCon Sofia 2019 Be Clear With Your Answers
  • 42. Follow the Query Through What is an Emergency Fund How Much to Save Types of Emergency Funds How to Build an Emergency Fund Answer yourself the question: “does this article answer all the subjects and questions a searcher might have when they search?” Google can identify these follow through topics and questions by looking at follow up searches and query refinement within search sessions. Google can improve searcher satisfaction if its able to satisfy searchers sooner by giving them content that eliminates the need for two to three additional searches. If a user searches for “Emergency Fund,” they may have the following goals on their journey:
  • 43. Disambiguate Entities Isolate the entities when not used in a sentence. When an entity is used outside of a sentence, try to isolate it in the text, and within an HTML tag where it appears, such as headings, list items, or table cells. Avoid grouping it with a price, year, category, parentheses, or any other data/text. Simplify your content around entities you want Google to extract successfully. There are two simple rules here:
  • 44. Disambiguate Entities When an entity can be confused, such as cities in multiple states, movies with the same name, or films vs. books, you can disambiguate entities by using indicator words in the same sentence. For the sentence “Portland is a great place to live,” the extracted entity is Portland, OR. For the sentence, “the Old Port neighborhood in Portland is a great place to live,” the extracted entity is Portland, ME. There is an entity relationship between “Portland (ME)” and “Old Port,” which allows Google to disambiguate the entity “Portland.” Brainstorm these indicator words when your entities could have multiple identities. Use indicator words to disambiguate entities.
  • 45. Serpact Ltd. | AffiliateCon Sofia 2019 Text Formatting Inverted Pyramid: Articles have a lede, body, and a tail. Content has different meanings based on how far down the page it appears. Headings: Headings define the content between it and the next heading. Subtopics: Think of headings as sub-articles within the parent article. Proximity: Proximity determines relationships. Words/phrases in the same sentence are closely related. Words/phrases in the same paragraph are related. Words/phrases in different sections are distantly related. Relationships: Subheadings have a parent – > child relationship. (A page with a list of categories as H2s and products as H3s is a list of categories. A page with a list of products as H2s and categories as H3s is a list of products.)
  • 46. Serpact Ltd. | AffiliateCon Sofia 2019
  • 47. Serpact Ltd. | AffiliateCon Sofia 2019 HTML Tags: Text doesn’t have to be in a heading tag to be a heading. (However, heading tags are preferred.)Text also doesn’t need a heading tag to have a parent -> child relationship. Its display and font formatting can visually dictate this without the use of heading tags. Lists: HTML ordered and unordered lists function as lists, which have a meaning.Headings can also perform as lists.Headings with a number first can work as an ordered list. Ordered lists imply rankings, order, or process.Short bolded “labels” or “summary” phrases at the start of a paragraph can function as a list. Tables inherently imply row/column relationships.Some formatting suggests classification, like addresses and date formats. Structure by Content Type: Some content types have expected data that define them. Events have names, locations, and dates. Products have names, brands, and prices. Other Formats:
  • 48. Serpact Ltd. | AffiliateCon Sofia 2019 Google NLP Tool Webtexttool - Text Metrics Semrush SEO Writing Assistant Tools for content optimization:
  • 52. Contacts EMAIL ADDRESS info@serpact.com PHONE NUMBER +359 (032) 260 096 WEBSITE serpact.com SERP.AC/FB SERP.AC/TWITTER SERP.AC/YOUTUBE