1
For far too long computer science has directed the development of search systems. This is
problematic from an experience p...
3
SES New York 2005: Mike Gehan… explains that engines want the most relevant results, which is hard "because end
users are ...
Filter on the way out but not the way in: The Web has no gatekeepers or “single way
of doing things.” Definitive directori...
6
The search engines are like people who keep buying bigger clothes to hide their weight game.
Soon the client must come as ...
Here it is, the famous, to some infamous PageRank algorithm. This is its most stripped down
state. Rumor has it that the a...
Hilltop was one of the first to introduce the concept of machine-mediated “authority” to combat
the human manipulation of ...
Most SEOs hate keywords. I say that they are like Jessica Rabbit in “Who Framed Roger
Rabbit”…not bad, just drawn that way...
Search 2.0 is the “wisdom of crowds”
Now we help each other find things. Online this takes the form of online bookmarking ...
If machines are methodical, as we’ve seen, and people are emotional, as we experience, where is the
middle ground? Are we ...
8/20/2010




Developed by a computer science student, this algorithm was the subject of an intense bidding
war between Go...
There is no such thing as “advanced search” longer. We’re all lulled into the false sense that the
search engine is smarte...
Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009




                     ...
Watch out for those Facebook applications, quizzes, etc, Tweets, Linked-in data
Improving Search using Population Informat...
17
Give them what they want as well as what you want to give them
Provide them with the means of interacting with their resul...
Interaction Design
Bowman leaves Google
http://stopdesign.com/archive/2009/03/20/goodbye-google.html
“Yes, it’s true that ...
User Centered Design




                       20
Information Architecture




                           21
22
23
24
25
Heystaks : social search: Save search results and share with friends
http://www.heystaks.com




                         ...
27
28
29
Some observers claim that Google is now running on as many as a million Linux servers. At the very least, it is
running on...
Equal Representation By Search Engines: Vaughn & Zhang (2007)




                                                        ...
Google China shows a different form of relevance with a focus on tourism for the square
In the last dispute, Google redire...
33
34
Upcoming SlideShare
Loading in …5
×

Not Your Mom's SEO

1,806 views

Published on

Search engines have changed a lot over the last 15 years and optimizing Websites for them must keep up. This presentation looks at the search landscape and present strategies and tactics for optimizing for today's search.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,806
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Not Your Mom's SEO

  1. 1. 1
  2. 2. For far too long computer science has directed the development of search systems. This is problematic from an experience point of view because computer science measures success by different standards than we do. Speed of data throughput and optimal storage are the foci for engineers. Accuracy is assumed by computational mathematics. It is no wonder then that search systems have developed with minimal attention to the user experience beyond the assumed perfection of results relevance and the appropriate ad-matching. The intent of this plenary is to inspire us all to engage on a deeper level in designing search experiences that do more than sell products well. 2
  3. 3. 3
  4. 4. SES New York 2005: Mike Gehan… explains that engines want the most relevant results, which is hard "because end users are search nitwits! http://www.seroundtable.com/archives/001600.html Too much information Hosted Websites •July 1993: 1,776,000 •July 2005: 353,084,187 Individual Web pages •1997: 200 million Web pages •2005: 11.5 billion pages – now likely well over 12 billion •2009: Google announces that its spiders have found 1 trillion URLs found and the Google index is at 100+billion pages No Silver Bullet Solution •Language and perception are different •Some people think women put their stuff in a purse, others a pocketbook, and others a handbag. •“Animal” is a form of mammal, a Sesame Street character, and an uncouth person •Over 140 calculations are now used for PageRank valuation and still “gets it wrong a good percentage of the time •Customers are looking because they don’t know •Customers no longer know how to construct successful queries •Search engine intent Is not always “finding the most relevant information” Cost of finding information according to an IDC April 2006 report = $5.3 million for every 1000 workers 4
  5. 5. Filter on the way out but not the way in: The Web has no gatekeepers or “single way of doing things.” Definitive directories are anything but and inhibit us from learning from one another. Set paths inhibit discovery for some users and information items. That’s why search engines were developed in the first place. Put each leaf on as many branches as possible: more branches = more discovery Everything is metadata: metadata is what “we” already know, data is what we’re trying to find. Metadata describes information in a way that maps to how the user looks for it: Google Insights for Search http://www.google.com/insights/search/ Give up control: miscellaneous organization of information contains relationships beyond recognition –more powerful to let the users mix it up themselves –online = user expectation is that they can organize it the way THEY want with tags, bookmarks, etc –information owners can offer a prebuilt categorization but users will continue to find their own way 5
  6. 6. 6
  7. 7. The search engines are like people who keep buying bigger clothes to hide their weight game. Soon the client must come as it has somewhat with Google that starts with whether or not the page is index-worthy. Google Caffeine: new infrastructure opened to developer testing in public beta (August 2009): Even cheap infrastructure has its cost limits and Google looks to reaching its limit with regard to retention of what it is finding out there, likely a lot of “Web junk” doesn’t even make the cut. Google Caffeine is: •Faster (basically real time indexing if you let the search engine know that the page/updates are there) •More keyword string based relevance •Better able to scale the index (mentioned 100 petabytes in march 2010 – a petabyte is 1000 terabytes) Currently, the determination is done by computational math. Who should decide what goes and stays? Us! We can influence the search engine’s behavior by getting rid of the “set it and forget it” method of Web publishing. Keep content fresh and current. Check every now and then. Publish deep, rich context-rich content and tend to it. Not all of it, the most important pieces. Not all content is created equal. Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009 System and Method of Encoding and Decoding Variable-length data: June 27, 2006 http://www.worldwidewebsize.com/ 7
  8. 8. Here it is, the famous, to some infamous PageRank algorithm. This is its most stripped down state. Rumor has it that the algorithm now has in excess of 27 components. We’ll look at some of these extensions in a few moments. Important to note: The PageRank algorithm is a pre-query calculation. It is a value that is assigned as a result of the search engine’s indexing of the entire Web and the associated value has no relationship to the user’s information need. There have been a number of additions and enhancements to lend some contextual credence to the relevance ranking of the results. When Google appears in 1998, it is the underdog to search giants like Alta Vista and Yahoo! Its simplified relevance model with the foundation of human mediation through linking [each link was at that time the product of direct human endeavor and so viewed as a “vote” for the page or site relevance and information merit]. It is not so much the underdog now with 64.6% of all U.S. searches (that would be 13.9 billion searches in August 2009- That would be nearly 420 million searches per day in the U.S. alone) There are only so many slots in the golden top 10 search results for any query. Am I the only one who is concerned with the consolidation of so much power in a single entity and is it perceived power, something we can do something about, or actual power, something that we must learn to live with? Comscore Search Engine Market Share August 2009 http://www.comscore.com/Press_Events/Press_releases/2009/9/comScore_Releases_August_20 09_U.S._Search_Engine_Rankings 8
  9. 9. Hilltop was one of the first to introduce the concept of machine-mediated “authority” to combat the human manipulation of results for commercial gain (using link blast services, viral distribution of misleading links. It is used by all of the search engines in some way, shape or form. Hilltop is: •Performed on a small subset of the corpus that best represents nature of the whole •Pages are ranked according to the number of non-affiliated “experts” point to it – i.e. not in the same site or directory •Affiliation is transitive [if A=B and B=C then A=C] The beauty of Hilltop is that unlike PageRank, it is query-specific and reinforces the relationship between the authority and the user’s query. You don’t have to be big or have a thousand links from auto parts sites to be an “authority.” Google’s 2003 Florida update, rumored to contain Hilltop reasoning, resulted in a lot of sites with extraneous links fall from their previously lofty placements as a result. Google artificially inflates the placement of results from Wikipedia because it perceives Wikipedia as an authoritative resources due to social mediation and commercial agnosticism. Wikipedia is not infallible. However, someone finding it in the “most relevant” top results will certainly see it as so.
  10. 10. Most SEOs hate keywords. I say that they are like Jessica Rabbit in “Who Framed Roger Rabbit”…not bad, just drawn that way. Keywords were the object of much abuse in the early part of the Web and almost totally discounted by the search engines. With the emerging Semantic Web that strengthens the topic- sensitive nature of relevance calculation combined with the technology’s ability to successfully compare two content items for context, keywords might make more sense. In any event, they do more good than harm. So, I advise my clients to have 2-4 key concepts from the page represented here. The caveat is that it be from the page. Topic-Sensitive PageRank Computes PR based on a set of representational topics [augments PR with content analysis] Topic derived from the Open Source directory Uses a set of ranking vectors: Pre-query selection of topics + at-query comparison of the similarity of query to topics 10
  11. 11. Search 2.0 is the “wisdom of crowds” Now we help each other find things. Online this takes the form of online bookmarking and community sites like Technorati (social sharing) Delicious (social bookmarking) and Twitter (micro-blogging) among others. Search engines are now leveraging these forums as well as their own extensive data collection to calculate relevance. Some believe that social media will replace search. How can your friends and followers beat a 100 billion page index? What if they don’t know? 11
  12. 12. If machines are methodical, as we’ve seen, and people are emotional, as we experience, where is the middle ground? Are we working harder to really find what we need or just taking what we get and calling it what we wanted in the first place? 12
  13. 13. 8/20/2010 Developed by a computer science student, this algorithm was the subject of an intense bidding war between Google and Microsoft that Google one. The student, Ori Alon, went to work for Google in April 2006 and has not been heard from since. There is no contemporary information on the algorithm or it’s developer. Relational content modeling done by machines-usually contextualized next steps. 13
  14. 14. There is no such thing as “advanced search” longer. We’re all lulled into the false sense that the search engine is smarter than us. Now the search engines present a mesmerizing array of choices distracting from the original intent of the search. Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009 14
  15. 15. Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009 15
  16. 16. Watch out for those Facebook applications, quizzes, etc, Tweets, Linked-in data Improving Search using Population Information (November 2008): Determine population information associated with the query that is derived from a population database Locations of users Populations that users are associated with Groups users are associated with (gender, shared interests, self- & auto-assigned identity data) Rendering Context Sensitive Ads for Multi-topic searchers (April 2008): Resolves ambiguities by monitoring user behavior to determine specific interest Presentation of Local Results (July 2008): Generating 2 sets of results, one with relevance based on location of device used for search Detecting Novel Content (November 2008): indentify and assign novelty score to one or more textual sequences for an individual document in a set Document Scoring based on Document Content Update (May 2007): scoring based on how document updated over time, rate of change, rate of change for anchor-link text pointing to document Document Scoring based on Link-based Criteria (April 2007): System to determine time-varying behavior of links pointing to a document ; growth in # of links pointing to the document (exceeds the acceptable threshold), freshness of links, age distribution of links deployed as Google Scout 16
  17. 17. 17
  18. 18. Give them what they want as well as what you want to give them Provide them with the means of interacting with their results 18
  19. 19. Interaction Design Bowman leaves Google http://stopdesign.com/archive/2009/03/20/goodbye-google.html “Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such minuscule design decisions. There are more exciting design problems in this world to tackle.” The announcement of Bowman leaving Google started a lengthy thread on the Interaction Design Association list about search design and interaction http://www.ixda.org/discuss.php?post=40237 19
  20. 20. User Centered Design 20
  21. 21. Information Architecture 21
  22. 22. 22
  23. 23. 23
  24. 24. 24
  25. 25. 25
  26. 26. Heystaks : social search: Save search results and share with friends http://www.heystaks.com 26
  27. 27. 27
  28. 28. 28
  29. 29. 29
  30. 30. Some observers claim that Google is now running on as many as a million Linux servers. At the very least, it is running on hundreds of thousands. When you consider that the application Google delivers is instant access to documents and services available from, by last count, more than 81 million independent web servers, we're starting to understand how true it is, as Sun Microsystems co-founder John Gage famously said back in 1984, that "the network is the computer." It took over 20 years for the rest of the industry to realize that vision, but we're finally there. ... First, privacy. Collective intelligence requires the storage of enormous amounts of data. And while this data can be used to deliver innovative applications, it can also be used to invade our privacy. The recent news disclosures about phone records being turned over to the NSA is one example. Yahoo's recent disclosure of the identity of a Chinese dissident to Chinese authorities is another. The internet has enormous power to increase our freedom. It also has enormous power to limit our freedom, to track our every move and monitor our every conversation. We must make sure that we don't trade off freedom for convenience or security. Dave Farber, one of the fathers of the Internet, is fond of repeating the words of Ben Franklin: "Those who give up essential liberty to purchase a little temporary safety deserve neither, and will lose both." Second, concentration of power. While it's easy to see the user empowerment and democratization implicit in web 2.0, it's also easy to overlook the enormous power that is being accrued by those who've successfully become the repository for our collective intelligence. Who owns that data? Is it ours, or does it belong to the vendor? If history is any guide, the democratization promised by Web 2.0 will eventually be succeeded by new monopolies, just as the democratization promised by the personal computer led to an industry dominated by only a few companies. Those companies will have enormous power over our lives -- and may use it for good or ill. Already we're seeing companies claiming that Google has the ability to make or break their business by how it adjusts its search rankings. That's just a small taste of what is to come as new power brokers rule the information pathways that will shape our future world. http://radar.oreilly.com/2006/05/my-commencement-speech-at-sims.html My Commencement Speech at SIMS (May 2006) 30
  31. 31. Equal Representation By Search Engines: Vaughn & Zhang (2007) 31
  32. 32. Google China shows a different form of relevance with a focus on tourism for the square In the last dispute, Google redirected its Google.cn searches to Google Hong Kong that does show results from the Tiananmen SQ protests in the top 10 results 32
  33. 33. 33
  34. 34. 34

×