SEARCH ENGINE Live and Learn 2010 Aj. Supichaya Nuntapunt School of Information Technology Mae Fah Luang University
The Web Defined Software application that allows us to publish and browse hypertext documents Transported over Internet HTTP Browsers are multiprotocol URL = Web address
Introduction Directories, Search Engines, and Metasearch  Engines Search Fundamentals Search Strategies How Does a Search Engine Work?
Directories, Search Engines, and Metasearch Engines Directories Popular Directories Search Engines Popular Search Engines Metasearch Engines Popular Metasearch Engines
 
Directories Hierarchical representation of hyperlinks Top level of general topics Sublevels of more specialized subtopics Easy to use Not necessary to know exactly what looking for
Popular Directories AOL NetFind CNET Search.com Excite Infoseek Looksmart Lycos Yahoo! Open Directory (dmoz.com)
Search Engines Computer program: Accepts a query Searches database Returns URLs Permits query revision Problem: many times search engine return too many URLs. You need to be specific! Query syntax
Popular Search Engines Google (85.35%), Yahoo(6.29%), Bing (3.27%) AOL, Ask, AltaVista, Excite, HotBot, Lycos, Fast search (alltheweb.com), DogPile As of December 2009 Ross Shannon: HTML Source http://www.yourhtmlsource.com
HitWise http://www.hitwise.com/us/datacenter/main/dashboard-10133.html
Ranks.nl - Compare Search Engines  http://www.ranks.nl/tools/compare.html
Metasearch Engines Call other search engines Use single query More matches
Popular Metasearch Engines Metasearch Metacrawler
Search Fundamentals Search Terminology Pattern Matching Queries Boolean Queries Search Domain Search Subjects
Search Terminology Search tool Query Query syntax Query semantics Hit or Match Relevancy score
Pattern Matching Queries Enter keyword(s) Search engine returns URLs
In-line/On-line: Fundamentals of the Internet and the World Wide Web
Boolean Queries George Boole AND, OR, and NOT Examples: You want to search for bass (the fish not the musical term) Vacation in either London or Paris
Search Domain Web Newsgroups Specialized databases Library
Search Subjects Metaspy shows searches for metacrawler in realtime. Google Search History
Introduction – Choose a search engine User-friendly interface Documentation Database size Relevancy scores
Too Many Hits: Search Specialization Add keywords Add  AND  or  NOT Capitalize proper nouns Use first 20 URLs
Too Few Hits: Search Generalization Eliminate keywords Remove  AND  or  NOT Enlarge search domain General keywords
 
How Google works BEFORE you search: “Crawls” pages on the public web Copies text & images, builds database WHEN you search: Automatically ranks pages in your results Word occurrence and location on page  Popularity - a link to a page is a vote for it ~ 200 factors in all!
Searching Google Think “full text” = be specific war of 1812 economic causes  vs.  history     Use academic & professional  terms domestic architecture  vs.  houses genome   society gets  International Mammalian Genome Society also try combinations with  association ,  research   center ,  institute ,  directory ,  database
Searching Google Specify exact phrases “ tom bates”   “what you're looking for is already inside you” Exclude or require a word proliferation  -nuclear bush legacy +environment
Limit your search to … Web page title intitle:hybrid  allintitle:hybrid mileage Website or domain site:whitehouse.gov “global warming” site:edu “global warming” File type filetype:ppt site:edu “global warming”
On the results page Search box (use to modify) “ Cache” “ Related pages” “ Translate this page”
Let’s try it ! Search Google Use our examples    or your own topics www.google.com
Google’s other databases
Why go beyond Google? Search more of the web Yahoo! Get more options Exalead
Let’s try it ! Try other search tools Compare results with Google
CRITICAL EVALUATION Why Evaluate What You Find on the Web? Anyone can put up a web page Many pages not updated No quality control most sites not “peer-reviewed”  less trustworthy than scholarly publications
Before you click to view the page... Look at the URL - personal page or site ?    ~   or  %   or  users   or  members Domain name appropriate for the content ? Restricted:  edu,  gov,  mil,  a few country codes (ca) Unrestricted:  com,  org,  net,  most country codes (us, uk) Published by an entity that makes sense ?  News from its source?  www. nytimes .com Advice from valid agency?  www. mfu .ac.th  e-learning . mfu .ac.th
Scan the perimeter of the page Can you tell who wrote it ? name of page author organization, institution, agency you recognize Credentials for the subject matter ? Look for links to: “ About us”  “Philosophy”  “Background”  “Biography” Is it current enough ? Look for “last updated” date
Examine the content Text possibly forged ? Sources documented with links or notes ? do the links work ? Evidence of bias in text or sources ?
Do some detective work Search the URL in  alexa.com Click on “Site info for … ”  Who owns the domain? Who links to the site?  What did the site look like in the past?  www.archive.org/web/web.php (Wayback Machine)
Does it all add up ? Was the page put on the web to  inform ?  persuade ?  sell ?  as a parody or satire ?  Is it appropriate for your purpose?
Try evaluating some sites... Search a topic in Google … Scan the first two pages of results Visit one or two sites  evaluate their quality and reliability
Questions?
References John Kupersmith: University of California, Berkeley Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web THANK YOU

Search Engines

  • 1.
    SEARCH ENGINE Liveand Learn 2010 Aj. Supichaya Nuntapunt School of Information Technology Mae Fah Luang University
  • 2.
    The Web DefinedSoftware application that allows us to publish and browse hypertext documents Transported over Internet HTTP Browsers are multiprotocol URL = Web address
  • 3.
    Introduction Directories, SearchEngines, and Metasearch Engines Search Fundamentals Search Strategies How Does a Search Engine Work?
  • 4.
    Directories, Search Engines,and Metasearch Engines Directories Popular Directories Search Engines Popular Search Engines Metasearch Engines Popular Metasearch Engines
  • 5.
  • 6.
    Directories Hierarchical representationof hyperlinks Top level of general topics Sublevels of more specialized subtopics Easy to use Not necessary to know exactly what looking for
  • 7.
    Popular Directories AOLNetFind CNET Search.com Excite Infoseek Looksmart Lycos Yahoo! Open Directory (dmoz.com)
  • 8.
    Search Engines Computerprogram: Accepts a query Searches database Returns URLs Permits query revision Problem: many times search engine return too many URLs. You need to be specific! Query syntax
  • 9.
    Popular Search EnginesGoogle (85.35%), Yahoo(6.29%), Bing (3.27%) AOL, Ask, AltaVista, Excite, HotBot, Lycos, Fast search (alltheweb.com), DogPile As of December 2009 Ross Shannon: HTML Source http://www.yourhtmlsource.com
  • 10.
  • 11.
    Ranks.nl - CompareSearch Engines http://www.ranks.nl/tools/compare.html
  • 12.
    Metasearch Engines Callother search engines Use single query More matches
  • 13.
    Popular Metasearch EnginesMetasearch Metacrawler
  • 14.
    Search Fundamentals SearchTerminology Pattern Matching Queries Boolean Queries Search Domain Search Subjects
  • 15.
    Search Terminology Searchtool Query Query syntax Query semantics Hit or Match Relevancy score
  • 16.
    Pattern Matching QueriesEnter keyword(s) Search engine returns URLs
  • 17.
    In-line/On-line: Fundamentals ofthe Internet and the World Wide Web
  • 18.
    Boolean Queries GeorgeBoole AND, OR, and NOT Examples: You want to search for bass (the fish not the musical term) Vacation in either London or Paris
  • 19.
    Search Domain WebNewsgroups Specialized databases Library
  • 20.
    Search Subjects Metaspyshows searches for metacrawler in realtime. Google Search History
  • 21.
    Introduction – Choosea search engine User-friendly interface Documentation Database size Relevancy scores
  • 22.
    Too Many Hits:Search Specialization Add keywords Add AND or NOT Capitalize proper nouns Use first 20 URLs
  • 23.
    Too Few Hits:Search Generalization Eliminate keywords Remove AND or NOT Enlarge search domain General keywords
  • 24.
  • 25.
    How Google worksBEFORE you search: “Crawls” pages on the public web Copies text & images, builds database WHEN you search: Automatically ranks pages in your results Word occurrence and location on page Popularity - a link to a page is a vote for it ~ 200 factors in all!
  • 26.
    Searching Google Think“full text” = be specific war of 1812 economic causes vs. history Use academic & professional terms domestic architecture vs. houses genome society gets International Mammalian Genome Society also try combinations with association , research center , institute , directory , database
  • 27.
    Searching Google Specifyexact phrases “ tom bates” “what you're looking for is already inside you” Exclude or require a word proliferation -nuclear bush legacy +environment
  • 28.
    Limit your searchto … Web page title intitle:hybrid allintitle:hybrid mileage Website or domain site:whitehouse.gov “global warming” site:edu “global warming” File type filetype:ppt site:edu “global warming”
  • 29.
    On the resultspage Search box (use to modify) “ Cache” “ Related pages” “ Translate this page”
  • 30.
    Let’s try it! Search Google Use our examples or your own topics www.google.com
  • 31.
  • 32.
    Why go beyondGoogle? Search more of the web Yahoo! Get more options Exalead
  • 33.
    Let’s try it! Try other search tools Compare results with Google
  • 34.
    CRITICAL EVALUATION WhyEvaluate What You Find on the Web? Anyone can put up a web page Many pages not updated No quality control most sites not “peer-reviewed” less trustworthy than scholarly publications
  • 35.
    Before you clickto view the page... Look at the URL - personal page or site ? ~ or % or users or members Domain name appropriate for the content ? Restricted: edu, gov, mil, a few country codes (ca) Unrestricted: com, org, net, most country codes (us, uk) Published by an entity that makes sense ? News from its source? www. nytimes .com Advice from valid agency? www. mfu .ac.th e-learning . mfu .ac.th
  • 36.
    Scan the perimeterof the page Can you tell who wrote it ? name of page author organization, institution, agency you recognize Credentials for the subject matter ? Look for links to: “ About us” “Philosophy” “Background” “Biography” Is it current enough ? Look for “last updated” date
  • 37.
    Examine the contentText possibly forged ? Sources documented with links or notes ? do the links work ? Evidence of bias in text or sources ?
  • 38.
    Do some detectivework Search the URL in alexa.com Click on “Site info for … ” Who owns the domain? Who links to the site? What did the site look like in the past? www.archive.org/web/web.php (Wayback Machine)
  • 39.
    Does it alladd up ? Was the page put on the web to inform ? persuade ? sell ? as a parody or satire ? Is it appropriate for your purpose?
  • 40.
    Try evaluating somesites... Search a topic in Google … Scan the first two pages of results Visit one or two sites evaluate their quality and reliability
  • 41.
  • 42.
    References John Kupersmith:University of California, Berkeley Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web THANK YOU

Editor's Notes

  • #36 Go through these procedures fairly quickly: there’s an exercise to learn this You want them to be able to understand the form and what it says. DOMAIN APPROPRIATE FOR THE CONTENT: Do you trust a NYT times article from a personal page as much as one from nytimes.com? A copy of Jackie Onassis’s will from a personal page as much as one from the California Bar Assn.? Example of a personal page would be: www.aol.com/~jbarker They are loosely paralleled by the sequence of the form in the next exercise.
  • #39 You can trust the lii.org more than many referrals. If there are annotations by professionals, that helps. The burden is on you, always. Demonstrate link: search example in Google. Use http://www.hanksville.org/yucatan/mayacal.html