Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang firstname.lastname@example.org and ShawnWalker email@example.comThe document with the highest proportion of terms which are part of the query ismost relevant• Documents containing more of the term(s) scored higher• Longer documents discounted• Rare terms weighted higher5
Hilltop was one of the first to introduce the concept of machine-mediated “authority” tocombat the human manipulation of results for commercial gain (using link blast services, viraldistribution of misleading links. It is used by all of the search engines in some way, shape orform.Hilltop is:•Performed on a small subset of the corpus that best represents nature of the whole•Authorities: have lots of unaffiliated expert document on the same subject pointing to them•Pages are ranked according to the number of non-affiliated “experts” point to it – i.e. not inthe same site or directory•Affiliation is transitive [if A=B and B=C then A=C]The beauty of Hilltop is that unlike PageRank, it is query-specific and reinforces therelationship between the authority and the user’s query. You don’t have to be big or have athousand links from auto parts sites to be an “authority.” Google’s 2003 Florida update,rumored to contain Hilltop reasoning, resulted in a lot of sites with extraneous links fall fromtheir previously lofty placements as a result.Google artificially inflates the placement of results from Wikipedia because it perceivesWikipedia as an authoritative resources due to social mediation and commercial agnosticism.Wikipedia is not infallible. However, someone finding it in the “most relevant” top results willcertainly see it as so.
Computes PR based on a set of representational topics [augments PR with content analysis]Topic derived from the Open Source directoryUses a set of ranking vectors: Pre-query selection of topics + at-query comparison of thesimilarity of query to topics8
Pew Internet Trust Study of Search engine behaviorhttp://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspxMoreover, users report generally good outcomes and relatively high confidence in the capabilities ofsearch engines:• 91% of search engine users say they always or most of the time find the information they areseeking when they use search engines• 73% of search engine users say that most or all the information they find as they use searchengines is accurate and trustworthy• 66% of search engine users say search engines are a fair and unbiased source of information• 55% of search engine users say that, in their experience, the quality of search results is gettingbetter over time, while just 4% say it has gotten worse• 52% of search engine users say search engine results have gotten more relevant and useful overtime, while just 7% report that results have gotten less relevantUsing the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 200956% constructed poor queries55% selected irrelevant results 1 or more times38% overwhelmed by amount of information in results34% found critical information missing from results9