SlideShare a Scribd company logo
#pubcon
Semantics and Search
Presented by:
Upasna Gautam
aka Pas
#pubcon
Objectives
•What is semantic search?
•What is NOT semantic search?
•How does Google make it work?
•How can you make it work?
#pubcon
SEO:
Then and Now
#pubcon
SEO: Then & Now
Back then:
•Keyword-focused:
• Text retrieval system relied on exact match keywords
• Weighted documents by keyword frequency
•Unable to distinguish synonyms and homographs
• Synonym: Words that share the same meaning (e.g. car and
automobile)
• Homograph: More than one meaning depending on context
(e.g. “charge)
#pubcon
SEO: Then & Now
Now:
•Driven by intent and context
•Provide relevant answers to
complex and vague queries
#pubcon
SEO: Then & Now
#pubcon
SEO: Then & Now
Now:
•“best vegan tacos austin”
•“late night texmex delivery austin”
•“best happy hour margaritas 78701”
#pubcon
SEO: Then & Now
Now:
Search Experience Optimization
#pubcon
SEO: Then & Now
What enabled search
engines to understand our
queries on an
intelligent level?
#pubcon
Hummingbird
2013
#pubcon
What is Semantic Search
Semantics:
A branch of linguistics that studies the relationship between words and
sentences and their actual meanings.
Semantic Search:
The improvement of search accuracy by understanding intent and
context, using various on-site elements to crawl, index, and serve
relevant results.
#pubcon
What is Semantic Search
•Entity Optimization
•Knowledge Graph
•Structured Data
•Information Architecture
•Co-occurrence and Clustering
#pubcon
What is Semantic Search:
Entity Optimization
Paul Haahr – Google Ranking Engineer – SMX 2016
#pubcon
What is Semantic Search:
Knowledge Graph
•Understands relationships between things
•Stores and understands the intelligence between
different entities
•Not just a catalog of objects, but a data model for
inter-relationships
#pubcon
What is Semantic Search:
Structured Data
•Google is a data-driven machine that needs to be
fed in order for it to learn
•Feed it structured data – it’s a piece of intelligence
the crawler uses to build semantic relevance and
authority
•This is how entities are indexed!
#pubcon
What is Semantic Search:
Information Architecture
•Allows for a crawler to clearly understand content and how it’s connected
•Provide a clear and hierarchical path of information
•Lends to a good UX
•The RIGHT approach is the most LOGICAL approach
•Must read: Information Architecture for the World Wide Web [3rd Edition, by
Peter Morville]: https://www.amazon.com/Information-Architecture-World-Wide-
Web/dp/0596527349
#pubcon
What is Semantic Search:
Co-Occurrence and Clustering
Word Co-Occurrence Clustering
• Generates topics from words frequently occurring together
Weighted Bigraph Clustering
• Uses URLs from Google search results to induce query similarity and
generate topics
The combination of these two methods demonstrated greater usefulness
and accuracy when compared to Latent Semantic Analysis.
Read the patent here:
https://pdfs.semanticscholar.org/dcf7/05ba07ee1b73fda0c94e9d01b2474173e470.pdf
#pubcon
What is Semantic Search:
Co-Occurrence and Clustering
Word Co-Occurrence
• A set of words anchors serve as initial topics, which are then
generalized to other words co-appearing with the same queries.
• Topics are created using hierarchical clustering on query
similarity, which measures to what extent two queries agree on their
intersections with the list of words in each topic.
Bigraph Clustering
• Uses organic results to create a bigraph with a set of queries and a set
of URLs as nodes. Weights of the graph are computed with the
impression and click data.
• Bigraph clustering works very well even if the queries do not share
common words
#pubcon
Latent Semantic Indexing
Is NOT
Semantic Search
#pubcon
BUT…
#pubcon
• Learning the mathematical relevance helps to understand search
on a functional level
• LSI uses Singular Value Decomposition which is a linear algebraic
factorization for many of our modern algorithms
• It is not a way to “do SEO”
• LSI KEYWORDS ARE NOT A THING
#pubcon
Latent Semantic Indexing
Latent Semantic Indexing (LSI):
•Mathematical algorithm based on Singular Value Decomposition (SVD)
•Text indexing and retrieval method
•How terms and concepts are related
#pubcon
Latent Semantic Indexing
•LSI works by projecting a large multi-
dimensional space down into a smaller
number of dimensions
•Semantically similar words get
bunched together
•Boundary blurring allows LSI to go
beyond exact keyword matching
#pubcon
Latent Semantic Indexing
•LSI uses Singular Value Decomposition (SVD) to decompose this matrix
•Preserves information about relative distances between document vectors
•Collapsed into smaller dimensions
•Information is lost and words are superimposed on one another
#pubcon
Latent Semantic Indexing
•Noise reduction
•Reveal similarities that were latent
•Similar terms become more similar, while dissimilar things remain distinct
This method is a widely used technique to unveil latent themes in text
data, as these models learn the hidden topics by understanding
document level word co-occurrence patterns.
#pubcon
Latent Semantic Indexing
Short texts, such as search queries, tweets or instant messages suffer from
data sparsity, which causes problems for traditional topic modeling
techniques. Unlike proper documents, short text snippets do not provide
enough word counts for models to learn how words are related and to
disambiguate multiple meanings of a single word.
*This is why the binary co-occurrence/clustering model works better*
#pubcon
Key Takeaways
#pubcon
Key Takeaways
•Craft and optimize content for topics and concepts, not just
keywords
•Use structured data to feed crawler the semantic intelligence it
needs to understand your site better
•Align the information architecture of your website to the
consumer journey
•Navigation, sitemaps, page structure, content organization
•Stop saying/using “LSI keywords”
•The best approach is the most logical approach!
#pubcon
The End

More Related Content

Similar to Semantics and Search by Upasna Gautam at PubCon Austin 2018

State of Search 2017 - Semantics and Science - Upasna Gautam
State of Search 2017 - Semantics and Science - Upasna GautamState of Search 2017 - Semantics and Science - Upasna Gautam
State of Search 2017 - Semantics and Science - Upasna Gautam
Upasna Gautam
 
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For YouConductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor
 
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick StoxA Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
patrickstox
 
Upsana Gautam - Advanced Search Summit Napa 2019
Upsana Gautam - Advanced Search Summit Napa 2019Upsana Gautam - Advanced Search Summit Napa 2019
Upsana Gautam - Advanced Search Summit Napa 2019
Digital Marketers Organization
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
Paul Shapiro
 
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Bill Slawski
 
Technical Club PPT for BTech CS and Btech IT
Technical Club PPT for BTech CS and Btech ITTechnical Club PPT for BTech CS and Btech IT
Technical Club PPT for BTech CS and Btech IT
paurushsinhad
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
Amanda King
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
Barbara Starr
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
Michael King
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
Bharvi Dixit
 
Staff study talk/ on search engine & internet in 2008
Staff study talk/ on search engine & internet in 2008Staff study talk/ on search engine & internet in 2008
Staff study talk/ on search engine & internet in 2008
Sujit Chandak
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Simon Hughes
 
Evaluating search engines
Evaluating search enginesEvaluating search engines
Evaluating search engines
Phil Bradley
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET Journal
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
Simon Hughes
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
OpenSource Connections
 

Similar to Semantics and Search by Upasna Gautam at PubCon Austin 2018 (20)

State of Search 2017 - Semantics and Science - Upasna Gautam
State of Search 2017 - Semantics and Science - Upasna GautamState of Search 2017 - Semantics and Science - Upasna Gautam
State of Search 2017 - Semantics and Science - Upasna Gautam
 
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For YouConductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
 
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick StoxA Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
 
Upsana Gautam - Advanced Search Summit Napa 2019
Upsana Gautam - Advanced Search Summit Napa 2019Upsana Gautam - Advanced Search Summit Napa 2019
Upsana Gautam - Advanced Search Summit Napa 2019
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
 
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
 
Technical Club PPT for BTech CS and Btech IT
Technical Club PPT for BTech CS and Btech ITTechnical Club PPT for BTech CS and Btech IT
Technical Club PPT for BTech CS and Btech IT
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
 
Staff study talk/ on search engine & internet in 2008
Staff study talk/ on search engine & internet in 2008Staff study talk/ on search engine & internet in 2008
Staff study talk/ on search engine & internet in 2008
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Evaluating search engines
Evaluating search enginesEvaluating search engines
Evaluating search engines
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Semantics and Search by Upasna Gautam at PubCon Austin 2018

  • 1. #pubcon Semantics and Search Presented by: Upasna Gautam aka Pas
  • 2. #pubcon Objectives •What is semantic search? •What is NOT semantic search? •How does Google make it work? •How can you make it work?
  • 4. #pubcon SEO: Then & Now Back then: •Keyword-focused: • Text retrieval system relied on exact match keywords • Weighted documents by keyword frequency •Unable to distinguish synonyms and homographs • Synonym: Words that share the same meaning (e.g. car and automobile) • Homograph: More than one meaning depending on context (e.g. “charge)
  • 5. #pubcon SEO: Then & Now Now: •Driven by intent and context •Provide relevant answers to complex and vague queries
  • 7. #pubcon SEO: Then & Now Now: •“best vegan tacos austin” •“late night texmex delivery austin” •“best happy hour margaritas 78701”
  • 8. #pubcon SEO: Then & Now Now: Search Experience Optimization
  • 9. #pubcon SEO: Then & Now What enabled search engines to understand our queries on an intelligent level?
  • 11. #pubcon What is Semantic Search Semantics: A branch of linguistics that studies the relationship between words and sentences and their actual meanings. Semantic Search: The improvement of search accuracy by understanding intent and context, using various on-site elements to crawl, index, and serve relevant results.
  • 12. #pubcon What is Semantic Search •Entity Optimization •Knowledge Graph •Structured Data •Information Architecture •Co-occurrence and Clustering
  • 13. #pubcon What is Semantic Search: Entity Optimization Paul Haahr – Google Ranking Engineer – SMX 2016
  • 14. #pubcon What is Semantic Search: Knowledge Graph •Understands relationships between things •Stores and understands the intelligence between different entities •Not just a catalog of objects, but a data model for inter-relationships
  • 15. #pubcon What is Semantic Search: Structured Data •Google is a data-driven machine that needs to be fed in order for it to learn •Feed it structured data – it’s a piece of intelligence the crawler uses to build semantic relevance and authority •This is how entities are indexed!
  • 16. #pubcon What is Semantic Search: Information Architecture •Allows for a crawler to clearly understand content and how it’s connected •Provide a clear and hierarchical path of information •Lends to a good UX •The RIGHT approach is the most LOGICAL approach •Must read: Information Architecture for the World Wide Web [3rd Edition, by Peter Morville]: https://www.amazon.com/Information-Architecture-World-Wide- Web/dp/0596527349
  • 17. #pubcon What is Semantic Search: Co-Occurrence and Clustering Word Co-Occurrence Clustering • Generates topics from words frequently occurring together Weighted Bigraph Clustering • Uses URLs from Google search results to induce query similarity and generate topics The combination of these two methods demonstrated greater usefulness and accuracy when compared to Latent Semantic Analysis. Read the patent here: https://pdfs.semanticscholar.org/dcf7/05ba07ee1b73fda0c94e9d01b2474173e470.pdf
  • 18. #pubcon What is Semantic Search: Co-Occurrence and Clustering Word Co-Occurrence • A set of words anchors serve as initial topics, which are then generalized to other words co-appearing with the same queries. • Topics are created using hierarchical clustering on query similarity, which measures to what extent two queries agree on their intersections with the list of words in each topic. Bigraph Clustering • Uses organic results to create a bigraph with a set of queries and a set of URLs as nodes. Weights of the graph are computed with the impression and click data. • Bigraph clustering works very well even if the queries do not share common words
  • 19. #pubcon Latent Semantic Indexing Is NOT Semantic Search
  • 21. #pubcon • Learning the mathematical relevance helps to understand search on a functional level • LSI uses Singular Value Decomposition which is a linear algebraic factorization for many of our modern algorithms • It is not a way to “do SEO” • LSI KEYWORDS ARE NOT A THING
  • 22. #pubcon Latent Semantic Indexing Latent Semantic Indexing (LSI): •Mathematical algorithm based on Singular Value Decomposition (SVD) •Text indexing and retrieval method •How terms and concepts are related
  • 23. #pubcon Latent Semantic Indexing •LSI works by projecting a large multi- dimensional space down into a smaller number of dimensions •Semantically similar words get bunched together •Boundary blurring allows LSI to go beyond exact keyword matching
  • 24. #pubcon Latent Semantic Indexing •LSI uses Singular Value Decomposition (SVD) to decompose this matrix •Preserves information about relative distances between document vectors •Collapsed into smaller dimensions •Information is lost and words are superimposed on one another
  • 25. #pubcon Latent Semantic Indexing •Noise reduction •Reveal similarities that were latent •Similar terms become more similar, while dissimilar things remain distinct This method is a widely used technique to unveil latent themes in text data, as these models learn the hidden topics by understanding document level word co-occurrence patterns.
  • 26. #pubcon Latent Semantic Indexing Short texts, such as search queries, tweets or instant messages suffer from data sparsity, which causes problems for traditional topic modeling techniques. Unlike proper documents, short text snippets do not provide enough word counts for models to learn how words are related and to disambiguate multiple meanings of a single word. *This is why the binary co-occurrence/clustering model works better*
  • 28. #pubcon Key Takeaways •Craft and optimize content for topics and concepts, not just keywords •Use structured data to feed crawler the semantic intelligence it needs to understand your site better •Align the information architecture of your website to the consumer journey •Navigation, sitemaps, page structure, content organization •Stop saying/using “LSI keywords” •The best approach is the most logical approach!