Some queries are very simple - a search for "wikipedia" is non-ambiguous. It’s straightforward and can be effectively returned by even a very basic web search engine. Other searches aren't nearly as simple. Let's look at how engines might order two results - a simple problem most of the time, it can be somewhat complex depending on the situation.Since Content A contains the word “Batman” and Content B does not, the engine an easily choose which one to rank.
The search engine can use TF*IDF to determine that “Wiggum” is a much less common word than “chief” and thus, Content A is more relevant to the query than Content B. NOTE: This example also does a good job of showing the inherent weakness of a metric like keyword density.
Using co-occurrence, the engine can determine that phrases like “Daily Planet” and “Clark Kent” appear with “Superman” and thus, Content B is more relevant than Content A.
As humans reading both sentences, we can infer that Content B is obviously about the musical instrument – a piano – and the woman playing it. But a search engine armed with only the methods we described above will struggle since both sentences use the words “keys” and “notes”, some of the few clues to the puzzle.NOTE: We were pretty excited to see that our LDA modeling tool correctly scored B than higher than A… but then things got REALLY interesting.
For complex queries or when relating large quantities of results with lots of content-related signals, search engines need ways to determine the intent of a particular page. Simply because it containsa keyword 4 or 5 times in prominent places or even mentions similar phrases/synonyms doesn’t necessarily mean that it's truly relevant to the searcher's query.
In this imaginary example, every word in the English language is related to either "cat" or "dog“. They are the only topics available. To measure whether a word is more related to "dog," we use a vector space model that displaysthose relationships mathematically. The illustration does a reasonable job showing our simplistic world. Words like "bigfoot" are perfectly in the middle with no more closeness to "cat" than "dog." But words like "canine" and "feline" are clearly closer to one that the other and the degree of the angle in the vector model illustrates this-and gives us a number.BTW, in an LDA vector space model, topics wouldn't have exact label associations like "dog" and "cat" but would instead be things like "the vector around the topic of dogs.“Taking the simple model above and scaling it to thousands or millions of topics, each of which would have its own dimension. Using this construct, the model can compute the similarity between any word or groups of words and the topics its created. You can learn more about this from Stanford University's posting of Introduction to Information Retrieval, <http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html> which has a specific section on Vector Space Models <http://nlp.stanford.edu/IR-book/html/htmledition/dot-products-1.html>
The correlation with rankings of the LDA scores are uncanny. Certainly, they're not a perfect correlation, but that’s expected, given the complexity of Google's ranking algorithm. Seeing LDA scores show this dramatic result makes us seriously question whether there was causation at work here. We hope to do additional research via our ranking models to attempt to show that impact. Perhaps, good links are more likely to point to pages that are more "relevant" via a topic model or some other aspect of Google's algorithm that we don't yet understand naturally biases towards these.
Like anything else in the SEO world, manipulatively applying the process is probably a terrible idea. Even if this tool worked perfectly to measure keyword relevance and topic modeling in Google, it would be unwise to simply stuff 50 keywords on your page to get the highest LDA score you could. Quality content that real people actually want to find should be the goal of SEO and Google is sophisticated enough to determine the difference between junk content that matches topic models and real content that real users will like,even if the tool's scoring can't do that.
Search engines have, classically, relied on a relatively universal algorithm - one that rates pages based on the metrics available, without massive swings between verticals. In the past few years, however, savvy searchers and many SEOs have noted a distinct shift to a model where certain types of sites have a greater opportunity to perform for certain queries. The odds aren't necessarily stacked against outsiders, but the engines appear to bias to the types of content providers that are likely to fulfill the users' intent.For example, when a user performs a search for "lamb shanks," it could make a lot of sense to give an extra boost to sites whose content is focused on recipes and food.BillSlawsky reported on Entity Association - Rather than just looking for brands, it’s more likely that Google is trying to understand when a query includes an entity – a specific person, place, or thing. And if it can identify an entity, that identification can influence the search results that you see...
Google Plus Your World is about context. You get results that are biased to include what the people who you are connected with on Google+ are talking about, reviewing, liking, or with which they are connected.
Twitter Data Google: “We use it as a signal in our organic and news rankings. We enhance our news universal by marking how many people shared an article.” http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389
Twitter Test Page A Page B646 links from 36 root domains 1 link from 1 root domain 2 tweets 522 tweets http://www.seomoz.org/blog/how-do-tweets-influence-search-rankings-an-experiment-for-a-cause
Twitter: Clearly InfluencingGoogle Page B – the tweeted version – ranks #1! Page A Page B646 links from 36 root domains 1 link from 1 root domain 2 tweets 522 tweets http://www.seomoz.org/blog/how-do-tweets-influence-search-rankings-an-experiment-for-a-cause
Twitter Data for QDF http://www.seomoz.org/blog/tweets-effect-rankings-unexpected-case-study
Author AuthorityDo Search Engines Use Author Authority toRank Pages in the SERPs? Google: Yes we compute Bing: Yes. We calculate the and use author quality. authority of someone who tweets. We don’t know who For known public figures or anyone is in real life. publishers, we do associate them with who they are. http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389
Search Engine Ranking Factors2011 Preliminary Data http://www.seomoz.org/blog/early-ranking-factors-data-an-april-linkscape-update
Big Changes from 2009 to 2011 • Link-Based Factors are waning • Social Data is increasing • Page-Level Link Metrics Fell the Most (43% - 22%) • Keyword-Level Domain Metrics, Brand Data + Social Rising The next update of the ranking factors will be online in April, 2013
Pandas andFarmers Gillian Muessig – ICMA April 2012
From the Mouths of Googlers Wired.com: How do you recognize a shallow-content site? http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
From the Mouths of Googlers Singhal: we ask… “Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?” http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
From the Mouths of Googlers Matt Cutts responds: we ask… 1. “Do you consider this site to be authoritative? 2. Would it be okay if this were in a magazine? 3. Does this site have excessive ads?” http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
From the Mouths of Googlers Wired.com: How do you implement that algorithmically? http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
From the Mouths of Googlers Cutts: …look for signals that recreate that same intuition http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
From the Mouths of Googlers Singhal: • Imagine in a hyperspace a bunch of points, some points are red and some points are green and in others there’s some mixture. Your job is to find a plane which says, Most things on this side of the plane are red and most of the things on that side of the plane are the opposite of red. http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
Googlers want to know…Are you trustworthy? http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
Are you an expert? Author? http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
Are your facts checked? http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
Are you genuinely interesting? http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
Over SEO’ing: OUT! seo seo seo seoseo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo seo
Don’t “Look” Like aContent Farm http://hubpages.com/hub/WomensFashionsofthe1920-FlappersandtheJazz-Age
Avoid “Classic” SEO Tactics Directory Link Building Keyword-Variant Abuse Reciprocal Link Pages Paid Links w/ Manipulative Anchor Text Sitewide, Footer Links Navigation for Engines, Not Humans Low Cost/Quality, Outsourced Content Generic Design and Layout Anonymous Contact Forms Anchor-Text Rich Internal Links Ad Blocks Dominating the Page Keyword Stuffed Titles + Pages It’s great to do good SEO, just don’t look like the only reason the site exists is to draw Google traffic
New and evolvingopportunities Gillian Muessig – ICMA April 2012
Become a “Brand” Brands Generics• Have real people working at a physical address • Often exist only online• Have authentic, followed social accounts • Rarely have significant social accounts• Display obvious, robust contact information • Frequently use email forms only• Register with government/civic organizations • Stay “under the radar”• Receive traffic from diverse sources • Search is often 90%+ of traffic• Generate branded search query volume • Have little-no branded search demand• Run offline marketing/advertising campaigns • Ignore the offline world http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
Do Competitive Research Where do these brands earn their links? http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html http://www.opensiteexplorer.org
When in Rome… Find Your Corporate Voice Phenomenal analysis of statements by I’m excited to be Googlers + how they able to share my translate to life’s passion with content/marketing you. actions: http://bit.ly/iGd7Pe http://outspokenmedia.com/social-media/quora-hipsters/
Get your social on • Stumble (upon) • Thumb up • Re-tweet • Like it • Share it • Digg it • Redd It http://outspokenmedia.com/social-media/quora-hipsters/
Get your social on And now… Pinterest http://outspokenmedia.com/social-media/quora-hipsters/