Structured data accounts for only a small part of the web and the problem grows as the volume of the online content grows. Schema markup is a drop in the ocean to help with this. However, things are being addressed in the natural language research space in the form of dense retrieval and other developments such as Sentence BERT and FAISS. Utilising heuristics such as umbrellas and sidecar pages will help to send clues and assist with ensuring search engines rank the right pages from your sites for SEO
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increasingly Small Worlds
1. ‘Natural Semantic SEO’ - Surfacing
walnuts in densely represented,
ever increasingly small worlds –
‘Umbrella’ and ‘Sidecar’ Approaches
- Dawn Anderson
Bertey
2. Semantic
SEO – So
many
definitions
float about
IT’S ALL ABOUT
SYNONYMS
IT’S ALL ABOUT TOPICS IT’S ALL ABOUT
INTENT
IT’S ALL ABOUT
‘THINGS’ NOT
‘STRINGS’
IT'S ABOUT ENTITIES
3. Simplified… it’s the connection &
understanding of 3 types of data
STRUCTURED – EASY TO
UNDERSTAND
SEMI-STRUCTURED – SOME
FORM / CATEGORISATION
UNSTRUCTURED – LOOSE /
DIFFICULT TO UNDERSTAND
5. True Structured Data Is Easy to Disambiguate
Resides in a
database
Is quantifiable
Can be tabular
data
Is well
organised
Is likely stored
in rows &
columns
Has relational
keys mapped
to fields
6. A simple table is ‘true structured data’ (a relational database)
Column
ID
Column
ID
Column
ID
Column
ID
Column
ID
Column
ID
Row ID Blah Blah Blah Blah Blah Blah
Row ID Blah Blah Blah Blah Blah Blah
Row ID Blah Blah Blah Blah Blah Blah
Row ID Blah Blah Blah Blah Blah Blah
Row ID Blah Blah Blah Blah Blah Blah
7. Wikipedia is the best example of semi-structured data & structured
data combined
8. But it’s estimated 80-90% of the world’s data is unstructured (2021)
Source: https://venturebeat.com/2021/07/22/why-unstructured-data-is-the-future-of-data-management/
15. Semantic search seeks to help with tying
structured data & unstructured data
together – Disambiguates context
Unstructured data
(text in web
content)
Structured data
(Knowledge graphs
/ knowledge
repositories)
20. 4 key areas of
search will
help us
understand
these
developments
Precision & Recall Ranking & Re-
ranking
Comprehensiveness
v Specificity
Lexical & Vector
Search
22. Precision & recall
Precision (accuracy /
preciseness)
How many precise documents
were retrieved
Recall
How many relevant, (but not
precise) documents were
retrieved
31. But there will be
a descending
order of intent
probability
32. Head Terms Will Have Multiple Intents
Most tools only categorise
head terms with a single intent
But this is not the case
Intent will have a descending
order of predictable
probability
Intent shifts often and with
temporal patterns /
predictability
Comprehensive search results
(Search Result Diversification)
seeks to meet all of these
possible intents probably
predicting intents in
descending order of likelihood)
33. Winner?
Head – Ranking via
comprehensiveness (meet
ALL the needs of the
under-specified query)
(within the collection)
Tail – Ranking by rareness
/ specificity / precision
(meet that one SPECIFIC
need with rarity)
36. Semantic Search (Vector
similarity search)
Vectors – Mathematical shapes where
words with similar meaning ‘live’ near
each other
Euclidean
distance
Cosine
Similarity
37. Lexical Search
is Sparse
“By definition, a sparse matrix is called
“sparse” if most of its elements are zero. In
the bag of words model, each document is
represented as a word-count vector.”
Source:
https://sebastianraschka.com/faq/docs/bag-
of-words-sparsity.html
49. use BERT on ‘passages’ of documents
Put Put the document back together
Break down Break down a document and turn it into passages
Passage Passage indexing / ranking & re-ranking
52. Go smaller still. Break
the problem down into
sentences & compare
cosine similarity of
sentence pairs
53. Sentence-bert
65 hours with BERT to find
two most similar
sentences from 10,000
sentences using normal
BERT / RoBERTa model
Same task around 5
seconds with Sentence-
bert
61. How about Sentence-BERT
with FAISS (Facebook AI
Similarity Search)
A library for efficient similarity search and clustering of dense vectors –
Facebook Research
63. BERTopic & C-TF-IDF (Class based Frequency)
Utilises topical classes /
concepts packed with
term frequency /
inverse document
frequency
Builds topic clusters as
an alternative to simple
word count in
proportion to
document length
65. Adapt Navigable Small World (NSW) Graph
Algorithms (Small World Networks) with
Hierarchical machine learned ‘Similarity
Distances’
66. ‘Hierarchical Navigable Small World Graphs’
utilizes ML driven tree-graph traversal to identify
‘nearness’ of semantically similar neighbours
(Approximate Nearest Neighbours)
120. Umbrella
Approaches add
intent hubs
‘above’ & are
navigational /
triage systems by
nature
Connect to Primary suspected intent
Connect to
Secondary
intent
Connect to
Secondary
intent
Connect to
Secondary
intent
Connct to
Secondary
intent
Connect to
Secondary
intent
Connect to
Secondary
intent
121. ‘Sidecar pages’ –
Divert low value
+ multiplier
pages to ‘sidecar
pages’
Colours Sizes
Teams Services Locations
123. Use internal links
Use
Hack the breadcrumb
Hack
Diverting links via sidecars
Connect
Show a semantic hierarchical relationship
Show
Umbrella approaches
Use