Search Engines


Published on

Supichaya Nuntapunt
Search Engines
for Mae Fah Luang University
Freshmen 2010 Live & Learn

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Go through these procedures fairly quickly: there’s an exercise to learn this You want them to be able to understand the form and what it says. DOMAIN APPROPRIATE FOR THE CONTENT: Do you trust a NYT times article from a personal page as much as one from A copy of Jackie Onassis’s will from a personal page as much as one from the California Bar Assn.? Example of a personal page would be: They are loosely paralleled by the sequence of the form in the next exercise.
  • You can trust the more than many referrals. If there are annotations by professionals, that helps. The burden is on you, always. Demonstrate link: search example in Google. Use
  • Search Engines

    1. 1. SEARCH ENGINE Live and Learn 2010 Aj. Supichaya Nuntapunt School of Information Technology Mae Fah Luang University
    2. 2. The Web Defined <ul><li>Software application that allows us to publish and browse hypertext documents </li></ul><ul><li>Transported over Internet </li></ul><ul><li>HTTP </li></ul><ul><li>Browsers are multiprotocol </li></ul><ul><li>URL = Web address </li></ul>
    3. 3. Introduction <ul><li>Directories, Search Engines, and Metasearch Engines </li></ul><ul><li>Search Fundamentals </li></ul><ul><li>Search Strategies </li></ul><ul><li>How Does a Search Engine Work? </li></ul>
    4. 4. Directories, Search Engines, and Metasearch Engines <ul><li>Directories </li></ul><ul><li>Popular Directories </li></ul><ul><li>Search Engines </li></ul><ul><li>Popular Search Engines </li></ul><ul><li>Metasearch Engines </li></ul><ul><li>Popular Metasearch Engines </li></ul>
    5. 6. Directories <ul><li>Hierarchical representation of hyperlinks </li></ul><ul><li>Top level of general topics </li></ul><ul><li>Sublevels of more specialized subtopics </li></ul><ul><li>Easy to use </li></ul><ul><li>Not necessary to know exactly what looking for </li></ul>
    6. 7. Popular Directories <ul><li>AOL NetFind </li></ul><ul><li>CNET </li></ul><ul><li>Excite </li></ul><ul><li>Infoseek </li></ul><ul><li>Looksmart </li></ul><ul><li>Lycos </li></ul><ul><li>Yahoo! </li></ul><ul><li>Open Directory ( </li></ul>
    7. 8. Search Engines <ul><li>Computer program: </li></ul><ul><ul><li>Accepts a query </li></ul></ul><ul><ul><li>Searches database </li></ul></ul><ul><ul><li>Returns URLs </li></ul></ul><ul><ul><li>Permits query revision </li></ul></ul><ul><li>Problem: many times search engine return too many URLs. You need to be specific! </li></ul><ul><li>Query syntax </li></ul>
    8. 9. Popular Search Engines <ul><li>Google (85.35%), Yahoo(6.29%), Bing (3.27%) </li></ul><ul><li>AOL, Ask, AltaVista, Excite, HotBot, Lycos, Fast search (, DogPile </li></ul><ul><li>As of December 2009 </li></ul>Ross Shannon: HTML Source
    9. 10. HitWise
    10. 11. - Compare Search Engines
    11. 12. Metasearch Engines <ul><li>Call other search engines </li></ul><ul><li>Use single query </li></ul><ul><li>More matches </li></ul>
    12. 13. Popular Metasearch Engines <ul><li>Metasearch </li></ul><ul><li>Metacrawler </li></ul>
    13. 14. Search Fundamentals <ul><li>Search Terminology </li></ul><ul><li>Pattern Matching Queries </li></ul><ul><li>Boolean Queries </li></ul><ul><li>Search Domain </li></ul><ul><li>Search Subjects </li></ul>
    14. 15. Search Terminology <ul><li>Search tool </li></ul><ul><li>Query </li></ul><ul><li>Query syntax </li></ul><ul><li>Query semantics </li></ul><ul><li>Hit or Match </li></ul><ul><li>Relevancy score </li></ul>
    15. 16. Pattern Matching Queries <ul><li>Enter keyword(s) </li></ul><ul><li>Search engine returns URLs </li></ul>
    16. 17. In-line/On-line: Fundamentals of the Internet and the World Wide Web
    17. 18. Boolean Queries <ul><li>George Boole </li></ul><ul><li>AND, OR, and NOT </li></ul><ul><li>Examples: </li></ul><ul><ul><li>You want to search for bass (the fish not the musical term) </li></ul></ul><ul><ul><li>Vacation in either London or Paris </li></ul></ul>
    18. 19. Search Domain <ul><li>Web </li></ul><ul><li>Newsgroups </li></ul><ul><li>Specialized databases </li></ul><ul><li>Library </li></ul>
    19. 20. Search Subjects <ul><li>Metaspy shows searches for metacrawler in realtime. </li></ul><ul><li>Google Search History </li></ul>
    20. 21. Introduction – Choose a search engine <ul><li>User-friendly interface </li></ul><ul><li>Documentation </li></ul><ul><li>Database size </li></ul><ul><li>Relevancy scores </li></ul>
    21. 22. Too Many Hits: Search Specialization <ul><li>Add keywords </li></ul><ul><li>Add AND or NOT </li></ul><ul><li>Capitalize proper nouns </li></ul><ul><li>Use first 20 URLs </li></ul>
    22. 23. Too Few Hits: Search Generalization <ul><li>Eliminate keywords </li></ul><ul><li>Remove AND or NOT </li></ul><ul><li>Enlarge search domain </li></ul><ul><li>General keywords </li></ul>
    23. 25. How Google works <ul><li>BEFORE you search: “Crawls” pages on the public web Copies text & images, builds database </li></ul><ul><li>WHEN you search: Automatically ranks pages in your results </li></ul><ul><ul><li>Word occurrence and location on page </li></ul></ul><ul><ul><li>Popularity - a link to a page is a vote for it </li></ul></ul><ul><ul><li>~ 200 factors in all! </li></ul></ul>
    24. 26. Searching Google <ul><li>Think “full text” = be specific </li></ul><ul><ul><li>war of 1812 economic causes vs. history </li></ul></ul><ul><li>Use academic & professional terms </li></ul><ul><ul><li>domestic architecture vs. houses genome society gets International Mammalian Genome Society also try combinations with association , research center , institute , directory , database </li></ul></ul>
    25. 27. Searching Google <ul><li>Specify exact phrases </li></ul><ul><li>“ tom bates” “what you're looking for is already inside you” </li></ul><ul><li>Exclude or require a word </li></ul><ul><ul><li>proliferation -nuclear </li></ul></ul><ul><ul><li>bush legacy +environment </li></ul></ul>
    26. 28. Limit your search to … <ul><li>Web page title intitle:hybrid allintitle:hybrid mileage </li></ul><ul><li>Website or domain “global warming” site:edu “global warming” </li></ul><ul><li>File type filetype:ppt site:edu “global warming” </li></ul>
    27. 29. On the results page <ul><li>Search box (use to modify) </li></ul><ul><li>“ Cache” </li></ul><ul><li>“ Related pages” </li></ul><ul><li>“ Translate this page” </li></ul>
    28. 30. Let’s try it ! <ul><li>Search Google </li></ul><ul><li>Use our examples or your own topics </li></ul>
    29. 31. Google’s other databases
    30. 32. Why go beyond Google? <ul><li>Search more of the web Yahoo! </li></ul><ul><li>Get more options Exalead </li></ul>
    31. 33. Let’s try it ! <ul><li>Try other search tools </li></ul><ul><li>Compare results with Google </li></ul>
    32. 34. CRITICAL EVALUATION Why Evaluate What You Find on the Web? <ul><li>Anyone can put up a web page </li></ul><ul><li>Many pages not updated </li></ul><ul><li>No quality control </li></ul><ul><ul><li>most sites not “peer-reviewed” </li></ul></ul><ul><ul><ul><li>less trustworthy than scholarly publications </li></ul></ul></ul>
    33. 35. Before you click to view the page... <ul><li>Look at the URL - personal page or site ? ~ or % or users or members </li></ul><ul><li>Domain name appropriate for the content ? </li></ul><ul><ul><li>Restricted: edu, gov, mil, a few country codes (ca) </li></ul></ul><ul><ul><li>Unrestricted: com, org, net, most country codes (us, uk) </li></ul></ul><ul><li>Published by an entity that makes sense ? </li></ul><ul><ul><li>News from its source? </li></ul></ul><ul><ul><ul><li>www. nytimes .com </li></ul></ul></ul><ul><ul><li>Advice from valid agency? </li></ul></ul><ul><ul><ul><li>www. mfu </li></ul></ul></ul><ul><ul><ul><li>e-learning . mfu </li></ul></ul></ul>
    34. 36. Scan the perimeter of the page <ul><li>Can you tell who wrote it ? </li></ul><ul><ul><li>name of page author </li></ul></ul><ul><ul><li>organization, institution, agency you recognize </li></ul></ul><ul><li>Credentials for the subject matter ? </li></ul><ul><ul><li>Look for links to: </li></ul></ul><ul><ul><li>“ About us” “Philosophy” “Background” “Biography” </li></ul></ul><ul><li>Is it current enough ? </li></ul><ul><ul><li>Look for “last updated” date </li></ul></ul>
    35. 37. Examine the content <ul><li>Text </li></ul><ul><ul><ul><li>possibly forged ? </li></ul></ul></ul><ul><li>Sources </li></ul><ul><ul><ul><li>documented with links or notes ? </li></ul></ul></ul><ul><ul><ul><li>do the links work ? </li></ul></ul></ul><ul><li>Evidence of bias </li></ul><ul><ul><ul><li>in text or sources ? </li></ul></ul></ul>
    36. 38. Do some detective work <ul><li>Search the URL in </li></ul><ul><ul><li>Click on “Site info for … ” </li></ul></ul><ul><ul><li>Who owns the domain? </li></ul></ul><ul><ul><li>Who links to the site? </li></ul></ul><ul><ul><li>What did the site look like in the past? </li></ul></ul><ul><ul><li> </li></ul></ul><ul><ul><li>(Wayback Machine) </li></ul></ul>
    37. 39. Does it all add up ? <ul><li>Was the page put on the web to </li></ul><ul><ul><li>inform ? </li></ul></ul><ul><ul><li>persuade ? </li></ul></ul><ul><ul><li>sell ? </li></ul></ul><ul><ul><li>as a parody or satire ? </li></ul></ul><ul><li>Is it appropriate for your purpose? </li></ul>
    38. 40. Try evaluating some sites... <ul><li>Search a topic in Google </li></ul><ul><ul><li>… </li></ul></ul><ul><li>Scan the first two pages of results </li></ul><ul><li>Visit one or two sites </li></ul><ul><ul><li>evaluate their quality and reliability </li></ul></ul>
    39. 41. Questions?
    40. 42. References <ul><ul><li>John Kupersmith: University of California, Berkeley </li></ul></ul><ul><ul><li>Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web </li></ul></ul>THANK YOU