A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content
Normal On-Page SEO
• Title tag
• Meta Description
• Header Tags
• Image name and alt attributes
• Keyword in URL
• Mobile Friendly
• Content visible
• Internal links
What’s the query trying to address?
We’ve All Seen This
Google’s Quality Raters Guidelines Has
• Know query, some of which are Know Simple queries
• Do query, some of which are Device Action queries
• Website query, when the user is looking for a
specific website or webpage
• Visit-in-person query, some of which are looking for
a specific business or organization, some of which
are looking for a category of businesses
What would you expect to see when visiting a website?
Physical Store: Address, Phone #, Hours of operation
E-Commerce: Pricing, Reviews, Return Policy, Contact
Some niches have things like certification numbers
Google Tells You Things Not To Do
• Automatically generated content
• Participating in link schemes
• Creating pages with little or no original content
• Sneaky redirects
• Hidden text or links
• Doorway pages
• Creating pages with malicious behavior, such as
phishing or installing viruses, trojans or other
• Scraped content
• Participating in affiliate
programs without adding
• Loading pages with
• Abusing rich snippets
• Sending automated queries
But Google Is Vague On What To Do
• Make pages primarily for users, not for search engines.
• Don’t deceive your users.
• Avoid tricks intended to improve search engine
• Think about what makes your website unique, valuable
or engaging. Make your website stand out from others
in your field.
The Good Practices Listed
• Monitoring your site for hacking and removing hacked
content as soon as it appears
• Preventing and removing user-generated spam on your
Bing Has A Nice Model
What Are These?
• Topical relevance to the query (“Does it address the
• Content Quality (as measured by Authority, Utility,
and Presentation), and
• Context (“Is the query about a recent topic?”,
“What’s the user’s physical location?” etc…)
Google Has More In Webmaster Academy
• Useful and informative
• More valuable and useful than other sites
• Broken Links
• Facts or Incorrect Information
How Deep Down The Rabbit Hole Do We
Want to Go? -> Readability
• Flesch Kincaid Reading Ease
• Flesch Kincaid Grade Level
• Gunning Fog Score
• Coleman Liau Index
• Automated Readability Index (ARI)
• SMOG (Simple Measure of Gobbledygook)
• Fog Index
• Lix formula
• Spache Index
• Dale-Chall Index
• Dale-Chall Grade
But Wait, There’s More!
• Position of content. Hidden/visible, font size, styling
• Who the author is
• What website the content is on
• Duplicate/uniqueness, different take, etc.
• Semantically related
Looking At Content Is The Fun Part
• Keyword density - times keyword appears on page /
total words on page, expressed as %
• LSI (Latent Semantic Indexing) - looks for closely
related words, synonyms, variants
Latent Semantic Analysis
Bag of words. Count based models.
It finds words mentioned but not really the meaning.
So we might see Hogwarts related to Harry Potter, but
not see it as a school for higher learning.
Term Frequency – Inverse Document
Frequency of a term within a document divided by its
frequency in the entire corpus
How important a word is in a document or collection of
Within Document Frequency - Inverse
This is basically keyword density 2.0 with a correction
value and weighted across a set of documents.
Like TF-IDF but takes into account document length.
Used by Common Search (building a nonprofit search
Unigram, bigram, trigram, four-gram, five-gram.
Basically co-occurring words and phrases.
Predictive instead of count based.
Tries to predict source context-words from the target
words. One word predicts a nearby word.
What Can You Do With Word2Vec?
• Measure the similarity between words or documents.
• Find most similar words to a word or phrase.
• Add and subtract words from each other to find
• Visualize the relationship between words in a