The Invisible Web
Upcoming SlideShare
Loading in...5
×
 

The Invisible Web

on

  • 575 views

 

Statistics

Views

Total Views
575
Views on SlideShare
575
Embed Views
0

Actions

Likes
0
Downloads
14
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Static pages – information is the same whether I log on or you log on…A static Web page contains content that displays the same way each time the page is requested from a browser. An example of a static Web page is a customer service page that contains contact information (such as phone numbers, fax numbers, e-mail addresses) that does not change frequently Three largest search engines in terms of internally reported documents indexed are Google with 1.35 billion documents, Fast with 575 million documents and Northern Light with 327 million documents. (Bergman)
  • Static pages – information is the same whether I log on or you log on…A static Web page contains content that displays the same way each time the page is requested from a browser. An example of a static Web page is a customer service page that contains contact information (such as phone numbers, fax numbers, e-mail addresses) that does not change frequently Three largest search engines in terms of internally reported documents indexed are Google with 1.35 billion documents, Fast with 575 million documents and Northern Light with 327 million documents. (Bergman)

The Invisible Web The Invisible Web Presentation Transcript

  • When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube [email_address]
  • What is the Visible (Surface) Web?
    • “ It’s made up of HTML Web pages that the search engines have chosen to include in their indices. It’s no more complicated than that.”
      • Sherman and Price.
  • What is the Visible (Surface) Web?
    • A collection of webpages
    • Searchable with “search engines”
    • What you and I think of as the “Internet” is actually only a small portion of the Internet
  • What is the Visible (Surface) Web?
    • High volume
    • Mass appeal
    • High value
    • Small percentage of web content
      • Exception: Google books and Google Scholar
  • What is the Invisible Web?
    • What search engines do not search
    • Searchable Databases
      • Tens of Thousands
      • Accessible and searchable via the Internet
      • Results often dynamically generated in specific response to your request (eBay, MapQuest, etc.)
  • What is the Invisible Web?
    • Excluded Pages
      • Excluded per search engine
      • Excluded per webpage by the owner of the site
    • Typically databases
      • Businesses
      • Governments
      • Schools
      • Libraries
      • Associations
  • What is the Invisible Web?
    • Academic
    • Never been indexed or linked
    • Uniquely generated pages
    • Proprietary
    • Confidential
    • Protected by username & password
    • Constitutes the majority of the webpages on the Internet
    • The Invisible Web is about 550 times larger than the visible web and is growing much faster
    • The deep Web consists of about 91,000 terabytes .
    • The surface Web is only about 167 terabytes1
    • The Library of Congress contains about 11 terabytes.
    • Quality content is 1,000 to 2,000 times greater than surface web
    • 95% of the Deep Web is accessible to public (no fees or subscription required)
    • based on extrapolations from a study done at University of California, Berkeley
    Visible vs. Invisible Web
    • Opaque Web
    • Private Web
    • Proprietary Web
    • Pay per click
    What is on the Invisible Web
      • Requires payment
      • Requires registration
      • Dynamically generated
      • Very new
      • Website specifically stops spiders
    Why can’t Google find it?
    • Fixed, or Could be indexed, but is not
    • Deemed not important enough
    • Too new and therefore not linked
    • Never makes max results cutoff
    • No one ever linked or submitted URL
    Opaque Web
  • Private Web
    • Deliberately excluded
      • Password
      • Special coding in website stops spiders
    • Only for select individuals
      • Employees
      • Students
      • Researchers
  • Proprietary Web
    • Protected
      • Password
      • Registration (N.Y. Times, eBay, banks, etc.)
      • Terms of Use
    • Anyone can access if you
      • Pay
      • Register
      • Agree to terms
  • Pay per click
    • Search Engine Marketing tools
    • Ex: overture.com , FindWhat.com
  • When do I use ….
    • Portal or Directory?
    • Search Engine?
    • Invisible Web?
  • Portal or Directory
    • You have a general topic
    • You know little about the subject
    • You do not know keywords
    • You want someone or something to have sorted out the junk
    • You need an exploratory overview
  • Search Engine
    • You are looking for something specific
    • You have keywords
    • You are pretty sure the information is
      • advertised or
      • otherwise generally disseminated
  • Tips for search engines
    • Use a toolbar
    • Determine the key words/phrases most likely to be in your document and nowhere else
    • Learn and use Boolean Operators
    • Scan results
    • Question the results
  • Invisible Web
    • You are pretty sure the information is in a specific database
    • Need something authoritative
    • Speed
    • The information is dynamically generated
    • You are familiar with the database
      • Search techniques
      • Protocols
      • Access requirements
  • Searching the Invisible Web
    • Directories – subject guide compiled by human editors
    • Specialized Search Engines
      • http:// library.albany.edu/internet/choose.html
    • Special Databases ( Library of Congress,
    • Library of Congress
      • http:// catalog.loc.gov
    • LookSmart’s Find Articles (over 900 publications
      • http:// www.findarticles.com
    • National Science Digital Library
      • http:// www.nsdl.org
    • Singing Fish – audio and video
      • http:// www.singingfish.com
  • Special Databases
    • Library of Congress
      • http:// catalog.loc.gov
    • LookSmart’s Find Articles (over 900 publications)
      • http:// www.findarticles.com
    • National Science Digital Library
      • http:// www.nsdl.org
    • Singing Fish – audio and video
      • http:// www.singingfish.com
  • Types of Databases
    • Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query.
    • Examples:
    • Phone books, People finders,
    • Patents, laws
    • Items for sale in a Web store or Web-based auctions
    • Digital exhibits
    • Multimedia and graphical files
    • Stock and bond prices
  • Types of Hidden Info
    • Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference
    • Pages requiring login or registration: Social Sites, New York Times, web based applications, calendars, Google Docs, etc.
    • Government publications or databases: ERIC, usa.gov
    • Online databases: Gale Research
    • PDF files, audio, video, any new format
  • More hidden stuff
    • Dictionaries and thesauri
    • Sites that require forms to be filled out (ex: travel direction, job hunting)
    • Product catalogs and library catalogs
    • Newspaper and magazine archives
    • Dynamic web pages (ex: airline flight checkers, mapquest)
    • Interactive tools (ex: calculators & measurement converters)
  • Access to invisible web is improving …
    • Google Books http://books.google.com/
    • Google Scholar http://scholar.google.co.il/
  • Maybe Consider …
    • Specialized Databases such as Dialog, Nexis Lexis, Factiva, etc. (not cheap)
    • Use an Information Professional www.aiip.org
  • To Conclude …
    • Focus and continue doing what you do best and what you have been trained for and let an Information Professional find the info you need.
    • He is trained to do it faster, more effectively and efficiently than you or one of your employees. ( www.aiip.org )