For a detailed recap: http://pshapi.ro/SemanticKWR
My BrightonSEO presentation...
1st Half: What is semantic search and why does it matter to SEOs.
2nd Half: Using KNIME to do semantic keyword research using SERP and Twitter data.
3. The Prevalence of Semantic Search (Unstructured)
Search Engines are coming to rely more-and-more on semantic search
technology to understand websites and how users search.
• As a result SEOs need to better understand how language and
keywords relate to each other in order to do more effective
keyword research.
Do semantic keyword
research!
4. What Is Semantic Search?
Strings can represent things:
• Search Engines are looking past exact match keyword occurrences
on web pages.
• They are learning the meaning behind keywords and examining how
they relate to each other conceptually
• The strength of that conceptual connection being scored for
relevancy within search queries and on-page.
5. What is a mammal that has a
vertebrate and lives in water?
10. What’s up with Hummingbird?
“Hummingbird is paying more attention to
each word in a query, ensuring that the whole
query – the whole sentence or conversation or
meaning – is taken into account, rather than
particular words. The goal is that pages
matching the meaning do better, rather than
pages matching just a few words.”
Hummingbird improves semantic understanding of search queries AND
makes conversational search better, which is important for the future of
mobile and voice search.
11. Hummingbird Summarized
I like Gianluca Fiorelli’s analysis of the theoretical capabilities of a post-
Hummingbird Google search:
1. To better understand the intent of a query;
2. To broaden the pool of web documents that may answer that query;
3. To simplify how it delivers information, because if query A, query B, and query C
substantively mean the same thing, Google doesn't need to propose three
different SERPs, but just one;
4. To offer a better search experience, because expanding the query and better
understanding the relationships between search entities (also based on
direct/indirect personalization elements), Google can now offer results that have
a higher probability of satisfying the needs of the user.
5. As a consequence, Google may present better SERPs also in terms of better
ads, because in 99% of the cases, verbose queries were not presenting ads in
their SERPs before Hummingbird.
Source: http://pshapi.ro/mozingbird
12. How Can SEOs Optimize for Semantic Search?
1. Make sure our content delights our users
Create quality content and use personas
2. Optimize for searcher intent and build topical authority using semantic
topic modeling
Understand how users search and
have command of your niche’s language
Now THIS is
great content.
13. Build Topical Authority for a Subject
When conducting keyword research, optimizing on-page, or creating
content, have a deep understanding of your niche’s language:
1. Understand how concepts relate to one another and which
keywords pertain to those concepts.
2. Ensure these concepts are well represented.
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
keywor
d
14. Optimize for Searcher Intent
Have an exceptional understanding of consumer language and the
myriad of ways users may search about your niche
1. What are consumers looking for when they are familiar with your
niche?
• Language used should represent core keywords.
2. What are consumers looking for when they are not familiar with
your niche?
• Language tends to be more conversational. You may
uncover more related terms when exploring your niche
from this perspective.
3. What else do these two groups search for?
• These searches may be directly and/or indirectly related.
16. Social Media Is an AWESOME Data Source
for Semantic Keyword Research
1. Social media data helps us expand our collection of keyword
ideas—especially new, breaking keywords.
2. Social media language is inherently conversational and can help us
understand how conversation queries may be phrased.
3. We can use it to mimic the language of the customer, which has a
secondary CRO benefit.
#Awesome
17. Secondary CRO Benefit: The Echo Effect
While you’re at it, use social media language to mimic the language of
your consumer. There are several studies that indicate it may help build
trust and boost conversions
• Study published in the International Journal of Hospitality
Management:
Waitresses who verbally mimicked a person’s order were more
likely to receive higher tips.
• Study publish in the Journal of Language and Social Psychology:
Mirroring people’s words can be very important in building
likability, safety, rapport, and social cohesion.
http://pshapi.ro/echohospitality
http://pshapi.ro/echoinfluence
18. Once We Collect SERP and Social Media Data...
There are some way we can break it down and analyze.
Co-occurrence
• How often two or more words appear along side each other in a
corpus of documents.
Latent Dirichlet Allocation (LDA)
• Finds semantically related keywords and groups them into topical
buckets.
TF-IDF (Term Frequency-Inverse Document Frequency)
• Reflects how important a keyword is to a document in a whole
collection of documents.
20. KNIME Is the One Tool to Rule Them All
• Free and open source, running on every platform
• Allows you to do things using a drag-and-drop interface that you would
normally need a developer or programming background to accomplish.
• Synergizes data-oriented tasks and helps easily automate:
Data collection
Data manipulation
Analysis
Visualization
Reporting
http://pshapi.ro/downloadknime
23. What’s a Node?
• Pre-built drag-and-drop boxes designed to do a single task.
• They are combined together into “workflows”
to do larger, more complex tasks.
• Nodes can be grouped together into meta-nodes which can be
configured in unison.
24. How Do You Add Nodes and How Do They Connect?
How do you add nodes?
How do you connect nodes to one-another?
26. Accessing Data from SERP and Twitter +
Common Node Configurations We’ll Be Using
27. Get a Twitter API Key
Fill out the forms!
• Application “Name”,
“Description”, and “Website”
don’t matter for our
purposes.
Go to “Keys and Access Tokens”
tab and grab:
• Consumer Key (API Key)
• Consumer Secret (API Secret)
Click “Create my access token”
and grab:
• Access Token
• Access Token Secret
Go to: https://apps.twitter.com/
28. Accessing Social Data – Twitter API Nodes
Right-Click and “Configure”
to input API information
Right-Click and “Configure”
Twitter Search Query (and
type)
30. Extract Only the Links from Twitter
A little trickier than it should be since you have to expand t.co links and
URL shorteners.
31. Accessing SERP Data – Inputting Data Manually
Manually input URLs with Excel Spreadsheet or CSV (Desktop Rank
Checkers)
Manually input URLs with “Table Creator” node (Right-Click Configure –
edit just like a spreadsheet)
32. Accessing SERP Data – Inputting Data via API (Better)
Example – GetSTAT
More-Complicated Meta Node Method
33. Make Webpages Plain Text (for Analysis)
Use Boilerpipe API (pre-made meta-node download to be provided)
http://boilerpipe-web.appspot.com/
35. Getting Things into a Text Analysis Format
Use the built-in “Strings To Document” node
36. A Few More Useful Base Nodes for Text Analysis
44. Bringing It All Together:
Applying Concepts to Visualizations
1. Search Twitter for keyword and collect all of the Tweet text
2. Search Twitter for keyword, extract links only, scrape text from links
3. Extract top 10 ranking pages keyword and scrape text from links
4. Isolate single word keywords and/or multi-word N-grams
5. Calculate TF-IDF
THEN we can…
• Tag Parts of Speech (Nouns, Adjectives, Verbs, etc.) and display in
Word Cloud
• Do Co-Occurrence Analysis and display in Node Graph (remember
earlier patent?)
• Identify semantic topic groupings with LDA and display in Node Graph
45. Analysis We Can Do Based on a Google Patent
Simplified with a smaller corpus, but easily replicable with KNIME:
1. Filter out too common terms using TF-IDF
2. Take the top 20 or so terms that are above a certain threshold based
upon TF-IDF and remove the rest.
3. Calculate Co-occurrence of the remaining terms.
4. Optimize your site for these!
Bill Slawski Patent Analysis: http://pshapi.ro/cooccurencepatent
Google Understand the query better,
understands the meaning on text on pages,
use query rewriting to be more efficient – return same results for searches that mean the same thing
Understands the connections between keywords and entity keywords
In Addition to Data from Ranking Webpages in the SERP…
Better than a spreadsheet – makes looking and the relation of keywords less onerous task – keyword relations are easier to identify