Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimal SEO (Marianne Sweeny)


Published on

Given at UXPA-DC's User Focus Conference, Oct. 19, 2012

  • Be the first to comment

  • Be the first to like this

Optimal SEO (Marianne Sweeny)

  1. 1. Some time ago, we fell asleep at the switch. Search engines are now “evaluating the merit” ofour content and are not entirely clear about the criteria that they are using. 1
  2. 2. This presentation is about Google’s latest updates, Panda and Penguin, and how they impactthe content that is retained by the search engines and presented in search results. We willlook at:1. What has happened with search engine technology over the years and what it is today2. Why we should care. How search engine technology impacts what we do. How what we do can impact the performance of search engines.3. What we can do about it. 2
  3. 3. Search engines came first. They have been around for over 70 years, since the their earlydays of “information retrieval” when text began to be electronically transformed in the late40’s. However, information organization and retrieval goes back even further than that… 3
  4. 4. An argument could be made that “search engine” optimization came first with theearly great care was taken to present information in a “findable” fashion…e.g. greatcare by a designated few to make information available in limited format to thelimited few who would consume and make available to the masses. People optimizedtext for people. 4
  5. 5. Then came the beautiful places where the information was organized in a standardized wayso that people could find it. And helpful people to ask for help finding information if we gotlost.Early search engines used traditional information retrieval concepts and structured contentrepositories that were mediated by human generated metadata. Dialog & ProQuest whereSQL queries rules, thought-processing bipeds associated tags, categories and abstracts to thecontent item. dB methods of linear query construction delivered most success. 5
  6. 6. First web page can still be found here came the World Wide Web, altruistically developed by Tim Berners Lee so that themilitary, industrial and scientific complexes could communicate with each other, be on thesame page and save money in the long distance exchange of information.This worked well until the medium was made available to the rest of us.The result…. 6
  7. 7. Then limitless growth, questionable quality and zero governance with no end in sight• 1997: 15 million pages• 2010: Google announces its 100 billion+ page index• 2012: rumored 1 trillion URLs found 7
  8. 8. © Tefko Saracevic Source: Saracevic 1997, Information Today One thing that did not change was information retrieval (IR). Despite the technology advancements, the IR process remained the same. 8
  9. 9. Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang and Shawn Walker stw3@uw.edu1. Documents were selected from the index based on the presence of query terms in document text.2. Documents containing more of the term(s) scored higher3. Longer documents discounted4. Rare terms weighted higher 9
  10. 10. The environment, devices, participants and content has changed. What does thatmean for IR? Search Engines? 10
  11. 11. IR’s locked in legacies are centered on• text deconstruction• the capacity for sequential instructions to derive meaning,• its reliance on systems that do not scale well and while incorporating human behavior, do not fully understand itSearch engines today believe that it is perfectly natural for them to abstract thewhole based on the nature of a small subset = “digital Maoism” 11
  12. 12. Using Google’s Latent Semantic Indexing, a machine-learning technique that manuallymaps relationships, a search for ~vacation turns up results for: hotels, rentals, travel,tourism, resorts…Machines know only what they are trained to know. Rules are based on an analysis ofa subset and applied to the content corpus writ large. Machines have no sense ofaccountability when things go bad. 12
  13. 13. Stanford research project that was once greeted as a savior due to the simplicity and seemingincorruptability.Both creators PHD students in data miningStandard IR with introduction of 2 human elements 1. Random Surfer model •At any time t, surfer is on some page P •At time t+1, the surfer follows an outlink from uniformly at random •Ends up on some page Q (from page P) •Process repeats indefinitely 2. Link = voteUnfortunately, flaws in this system were soon revealed:1. Those who were able to build links dictated relevance for the rest2. The cottage industry of SEO started building links for reasons other then endorsing the merits of site content 13
  14. 14. Google goes public around this time and the cash infusion enables expansionStarts acquiring top computer scientistsGoogle purchases technology (Kaltix – personalized search, context sensitive search)This is the first step away from the PageRank model, not entirely though as PageRankis part of Google’s locked-in technology foundation.And the response from us thought-processing bipeds? 14
  15. 15. We’re constructing worse queries but feel that we’re getting better results.Which canary in what coal mine just died? 15
  16. 16. Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009Pew Internet Trust Study of Search engine behavior January 2002, 52% of all Americans used search engines. In February 2012 that figure grew to 73%of all Americans. On any given day in early 2012, more than half of adults using the internet use asearch engine (59%). That is double the 30% of internet users who were using search engines on atypical day in 2004. And people’s frequency of using search engines has jumped dramatically.Moreover, users report generally good outcomes and relatively high confidence in the capabilities ofsearch engines:91% of search engine users say they always or most of the time find the information they are seekingwhen they use search engines73% of search engine users say that most or all the information they find as they use search engines isaccurate and trustworthy66% of search engine users say search engines are a fair and unbiased source of information55% of search engine users say that, in their experience, the quality of search results is getting betterover time, while just 4% say it has gotten worse52% of search engine users say search engine results have gotten more relevant and useful over time,while just 7% report that results have gotten less relevantAnd Google’s response… 16
  17. 17. Location on the page = good quality content “The goal of many of our ranking changes is to help searchers find sites that provide a great user experience and fulfill their information needs. We also want the “good guys” making great sites for users, not just algorithms, to see their effort rewarded. To that end we’ve launched Panda changes that successfully returned higher-quality sites in search results. And earlier this year we launched a page layout algorithm that reduces rankings for sites that don’t make much content available “above the fold.” Matt Cutts step-to-reward-high-quality.htmlUX run Amok: if not enough content appears above the fold, the page will be seen asless relevant? How many are dictating this for the rest of us? Where did they get thisfrom? “As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change. If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites 17
  18. 18. may not rank as highly going forward.” 17
  19. 19. Panda 1.0: Google’s first salvo against “spam” (shallow, thin content sites) in the form of content duplication and low valueoriginal content (i.e. “quick, give me 200 words on Brittany Spear’s vacation in the Maldives”) – biggest target was contentfarms – Biggest Impact: keyword optimization and link buildingKeyword optimization: Shift in focus from text on page to user experience makes optimizing for keywords counterintuitive. Biggest impact: shift from developer/shady SEO influence to usability/user experience focus – average loss inpositioning (% of KWs falling out of top 10 search results) – 70 to 90% for sites like, find,, and (SISTRIX)Link building: PageRank does not scale well to a 1 trillion page Web. Google cannot calculate PR fast enough to reranksites. PR now devalued as strongest influence behind ranking. Biggest impact: link building for higher PR = “what’s thepoint?”Panda 2.0: Changed rolled out to all English language queries English speaking countries , UK, Australia, etc., and incountries where English Language results are stipulated. Ranking incorporates searcher “blocking” data (from GoogleChrome feature).Panda 2.1: Having unique content not enough – quality factors introduced (some below) Trustworthiness: with my credit card information Uniqueness: is this saying what I’ve found somewhere else Origination: does the person writing the content have “street cred,” do I believe that this is anauthoritative resource on this topic Display: does the site look professional, polished Professional: is the content well constructed, well edited and without grammatical or spelling errorsPanda 2.2: Google going after site scrapers that repurpose content not their own or those who “outsource” contentdevelopment and maintenancePanda 2.3: Bounce rate (whether the user engages with the page at all) – Click through - Conversion 18
  20. 20. And sort of blames SEO for it (not outright but in a passive/aggressive) kind of way2007 Google Patent: Methods and Systems for Identifying Manipulated Articles (November2007)Manipulation:• Keyword stuffing (article text or metadata)• Unrelated links• Unrelated redirects• Auto-generated in-links• Guestbook pages (blog post comments)Followed up: Google Patent: Content Entity Management (May 2012) 19
  21. 21. February 2011: algorithm focused on content quality - originally thought to be aimed at contentfarmsJune 2011: update to identify scraped or duplicated contentOctober 2011: unannounced update to rectify site “unfairly impacted” by original updatesJanuary 2012: sites with too much ad space above the fold are devaluedThe slide lists approximately 10% of the changes that Google told us about and what they tell usabout likely represents .10% of the changes that they actually make. (source: freshness bug fix: “This change turns off a freshness algorithm component in certain caseswhen it should be affecting the search results.”Will serve up the newer document when choosing between two (from a given site) 20
  22. 22. Where’s Heidi Klum when we need her. Google’s quality content bar is higher and moresubjective than Project Runway.Google: Arbiter of Content & Relevance“Those other sites are not bringing additional value. While they’re not duplicates they bringnothing new to the table.”Google’s advice to site owners:“If it is already a crowded space with entrenched players, consider focusing on a niche areainitially, instead of going head to head with the existing leaders of the space.” 21
  23. 23. The Penguin update is a bit different because it is an aggressive move on Google’s part thatstarts with an algorithmic review. If a threshold is crossed, a human review takes place andmost sites are then significantly demoted in rankings or removed all together.• Overly repetitive anchor text (“manipulative, repetitive anchor text”)• Blog comments filled with spam (reviews/comments that contain links to “spam”) – Google’s definition of spam similar to Supreme Court for• Porn, no explanation of what this is. The search engine spiders just know it when they see it• Obscene content• Web “clusters” – multiple Web sites on the same host, from same domain owner, linking to article in artificial way 22
  24. 24. Targets “exact match” keyword-ed links or aggressive anchor text to google • sites penalized had “moneyed keywords” in 65% of their incoming links • Obviously aimed at the long standing practice of outsourcing link building to 3rd world countries and the weed-like growth of useless directories (i.e. link farms)Too many links from “related sites • Same niche • Same domain host • Same domain ownerStandard SEO signals • Stuffed <title> and metaDescription • Hidden text • Unrelated links on and pointing to the page • Computer generated text (i.e. dynamically rendered product pages) 23
  25. 25. 24
  26. 26. The search engines think that we’re superfluous because we don’t “get search” That’s whatI’m here to end. I want you to “get search.” We are information professionals, not mice!We’re going to use every neuron, synapsis and gray cell to fight back.We will shift from trying to optimize search engine behavior to optimizing what the searchengines consume, move from search engine optimization to information optimization• We will Focus• We will be Collaborative• We will get Connected• We will stay CurrentBecause we are user experience professionals, not Matt Cutts, Sergey Brin or Larry Page. 25
  27. 27. 26
  28. 28. Tools:Core Metadata: 20-30 terms that represent intersection between client objectives and howtheir customers search for the product/serviceContent analytics: top pages, bounce rate, visitor flowContent audit: keep/kill/revise based on thorough review using manual audit or toolsavailable through resources those from @content_insight 27
  29. 29. Stronger G+ profile = more organic search traffic 28
  30. 30. If it barks, sings, dances, plays, changes whatever, annotate with something thesearch engine can crawl, deconstruct, associate with surrogate and store in the index• Relational content model: Next Steps as well as More Information using: guided tours, Best Bets, produced view, etc• Best Bets: editorially assigned result that may not be chosen by the search engine• Guided Tours: built on analysis of other user pathways and knowledge of corpus Produced Views: page of assembled content items focused on a single subject• Task List Drop Downs: “I Want To…” links to pages of assembled content focused on single common task 29
  31. 31. 30
  32. 32. This is a team effort. 31
  33. 33. It is not too soon to get started. 32