Semantics and Search by Upasna Gautam at PubCon Austin 2018

#pubcon
Semantics and Search
Presented by:
Upasna Gautam
aka Pas

#pubcon
Objectives
•What is semantic search?
•What is NOT semantic search?
•How does Google make it work?
•How can you make it work?

#pubcon
SEO: Then & Now
Back then:
•Keyword-focused:
• Text retrieval system relied on exact match keywords
• Weighted documents by keyword frequency
•Unable to distinguish synonyms and homographs
• Synonym: Words that share the same meaning (e.g. car and
automobile)
• Homograph: More than one meaning depending on context
(e.g. “charge)

#pubcon
SEO: Then & Now
Now:
•Driven by intent and context
•Provide relevant answers to
complex and vague queries

#pubcon
SEO: Then & Now
Now:
•“best vegan tacos austin”
•“late night texmex delivery austin”
•“best happy hour margaritas 78701”

#pubcon
SEO: Then & Now
Now:
Search Experience Optimization

#pubcon
SEO: Then & Now
What enabled search
engines to understand our
queries on an
intelligent level?

#pubcon
What is Semantic Search
Semantics:
A branch of linguistics that studies the relationship between words and
sentences and their actual meanings.
Semantic Search:
The improvement of search accuracy by understanding intent and
context, using various on-site elements to crawl, index, and serve
relevant results.

#pubcon
What is Semantic Search
•Entity Optimization
•Knowledge Graph
•Structured Data
•Information Architecture
•Co-occurrence and Clustering

#pubcon
What is Semantic Search:
Entity Optimization
Paul Haahr – Google Ranking Engineer – SMX 2016

#pubcon
Knowledge Graph
•Understands relationships between things
•Stores and understands the intelligence between
different entities
•Not just a catalog of objects, but a data model for
inter-relationships

#pubcon
Structured Data
•Google is a data-driven machine that needs to be
fed in order for it to learn
•Feed it structured data – it’s a piece of intelligence
the crawler uses to build semantic relevance and
authority
•This is how entities are indexed!

#pubcon
Information Architecture
•Allows for a crawler to clearly understand content and how it’s connected
•Provide a clear and hierarchical path of information
•Lends to a good UX
•The RIGHT approach is the most LOGICAL approach
•Must read: Information Architecture for the World Wide Web [3rd Edition, by
Peter Morville]: https://www.amazon.com/Information-Architecture-World-Wide-
Web/dp/0596527349

#pubcon
Co-Occurrence and Clustering
Word Co-Occurrence Clustering
• Generates topics from words frequently occurring together
Weighted Bigraph Clustering
• Uses URLs from Google search results to induce query similarity and
generate topics
The combination of these two methods demonstrated greater usefulness
and accuracy when compared to Latent Semantic Analysis.
Read the patent here:
https://pdfs.semanticscholar.org/dcf7/05ba07ee1b73fda0c94e9d01b2474173e470.pdf

#pubcon
Co-Occurrence and Clustering
Word Co-Occurrence
• A set of words anchors serve as initial topics, which are then
generalized to other words co-appearing with the same queries.
• Topics are created using hierarchical clustering on query
similarity, which measures to what extent two queries agree on their
intersections with the list of words in each topic.
Bigraph Clustering
• Uses organic results to create a bigraph with a set of queries and a set
of URLs as nodes. Weights of the graph are computed with the
impression and click data.
• Bigraph clustering works very well even if the queries do not share
common words

#pubcon
Latent Semantic Indexing
Is NOT
Semantic Search

#pubcon
• Learning the mathematical relevance helps to understand search
on a functional level
• LSI uses Singular Value Decomposition which is a linear algebraic
factorization for many of our modern algorithms
• It is not a way to “do SEO”
• LSI KEYWORDS ARE NOT A THING

#pubcon
Latent Semantic Indexing (LSI):
•Mathematical algorithm based on Singular Value Decomposition (SVD)
•Text indexing and retrieval method
•How terms and concepts are related

#pubcon
•LSI works by projecting a large multi-
dimensional space down into a smaller
number of dimensions
•Semantically similar words get
bunched together
•Boundary blurring allows LSI to go
beyond exact keyword matching

#pubcon
•LSI uses Singular Value Decomposition (SVD) to decompose this matrix
•Preserves information about relative distances between document vectors
•Collapsed into smaller dimensions
•Information is lost and words are superimposed on one another

#pubcon
•Noise reduction
•Reveal similarities that were latent
•Similar terms become more similar, while dissimilar things remain distinct
This method is a widely used technique to unveil latent themes in text
data, as these models learn the hidden topics by understanding
document level word co-occurrence patterns.

#pubcon
Short texts, such as search queries, tweets or instant messages suffer from
data sparsity, which causes problems for traditional topic modeling
techniques. Unlike proper documents, short text snippets do not provide
enough word counts for models to learn how words are related and to
disambiguate multiple meanings of a single word.
*This is why the binary co-occurrence/clustering model works better*

#pubcon
Key Takeaways
•Craft and optimize content for topics and concepts, not just
keywords
•Use structured data to feed crawler the semantic intelligence it
needs to understand your site better
•Align the information architecture of your website to the
consumer journey
•Navigation, sitemaps, page structure, content organization
•Stop saying/using “LSI keywords”
•The best approach is the most logical approach!

Semantics and Search by Upasna Gautam at PubCon Austin 2018

Recommended

Recommended

More Related Content

Similar to Semantics and Search by Upasna Gautam at PubCon Austin 2018

Similar to Semantics and Search by Upasna Gautam at PubCon Austin 2018 (20)

Recently uploaded

Recently uploaded (20)

Semantics and Search by Upasna Gautam at PubCon Austin 2018