Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Infobrokering And Searching The Deep Web

1,155 views

Published on

Infobrokering and Searching the Deep Web - the New Role of Employee of the Department of Medical Scientific Information.
Presentation from EAHIL Workshp Kraków 2007.

Published in: Education, Technology
  • Be the first to comment

Infobrokering And Searching The Deep Web

  1. 1. Infobrokering and Searching the Deep Web the New Role of Employee of the Department of Medical Scientific Information. Witold Kozakiewicz, Barbara Grala Main Library, Medical University of Łódź,Poland
  2. 2. "The Librarian", a 1556 painting by Giuseppe Arcimboldo
  3. 3. Information should be meaningful , valuable, adequate, complete, actual and reliable.
  4. 6. Google is like box of chocolate....
  5. 8. Deep Web <ul><li>The deep Web (or Deepnet , invisible Web or hidden Web ) refers to World Wide Web content that is not part of the surface Web indexed by search engines. </li></ul>
  6. 9. Deep Web Pages <ul><li>Dynamic content - dynamic pages which are returned in response to a submitted query or accessed only through a form (especially if open-domain input elements e.g. text fields are used; such fields are hard to navigate without domain knowledge). </li></ul><ul><li>Unlinked content - pages which are not linked to by other pages, which may prevent Web crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks). </li></ul><ul><li>Private Web - sites that require registration and login (password-protected resources). </li></ul><ul><li>Contextual Web - pages with content varying for different access contexts (e.g. ranges of client IP addresses or previous navigation sequence). </li></ul><ul><li>Limited access content - sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard, CAPTCHAs or pragma:no-cache/cache-control:no-cache HTTP headers), prohibiting search engines from browsing them and creating cached copies. </li></ul><ul><li>Scripted content - pages that are only accessible through links produced by JavaScript as well as content dynamically downloaded from Web servers via Flash or AJAX solutions. </li></ul><ul><li>Non-HTML/text content - textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines. </li></ul>Source: Wikipedia
  7. 10. Deep Web databases Source: Bin He, Mitesh Patel, Zhen Zhang, Kevin Chen-Chuan Chang. Accessing the Deep Web http://doi.acm.org/10.1145/1230819.1241670
  8. 11. How to improve searching process?
  9. 12.   try to use abilities of search engines, use more complex questions with Boolean operators, keywords. Use advanced search option or search engine suggestions
  10. 13. try specialized services like Google Scholar, Google Books, MS Live Search Academic, Yahoo Search Subscriptions
  11. 14.   i f you are looking for specific file types, try dedicated search engines like Picsearch, or Yahoo Podcast Search;
  12. 15. try metasearch engines like friskr.com, dogpile.com, clusty.com, mamma.com turbo10.com ;
  13. 16. use specialized web services and database search engines PubMed, Medic8, WebMD, MammaHealth
  14. 17. use subject gateways – an online service that provides links to numerous other sites or documents on the Internet. (In tute , Scout Archives, BUBL Infomine )
  15. 18. try to search open access journals or repositories like DOAJ, OAIster;
  16. 19.   try to use find the specific database using CompletePlanet or Geniusfind ;
  17. 20. too many! too complicated!
  18. 21. The role of Librarian <ul><li>Help user to find the information </li></ul><ul><li>Choose proper search tools. </li></ul><ul><li>Prepare the tool-box </li></ul><ul><li>Teach how to use it. </li></ul>
  19. 23. Thank You.

×