Deep-Hidden-Invisible Web


Published on

All of us think that we can get all pages with the help of Google, MSN or Yahoo. However, we have large data that is deep, hidden and not visible to us. What is this invisibility. How to make it available to every one: we will see that in this presentation.

Published in: Technology, Design
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Deep-Hidden-Invisible Web

  1. 1. Deep-Hidden-Invisible Web Prepared by Prof.K.Prabhakar Assisted by P.Subha [email_address]
  2. 2. Background of the Invisible Web <ul><li>&quot; Invisible ” is used in the context of world wide web. It is not Invisible in the sense that it cannot be “ seen ”, the content is not available while we are searching using the most commonly used search engines such as Google or MSN or Yahoo. That is reason why many prefer using the word DEEP WEB rather than INVISIBLE WEB. </li></ul>
  3. 3. The size of Invisible web <ul><li>The size is difficult to measure and no estimation is possible due to dynamic nature of creation of content and fast pace with which the search engines index WebPages. </li></ul><ul><li>However its size as well as content is too large to ignore. </li></ul>
  4. 4. Will the situation change? <ul><li>Google said that it is dedicated to indexing the world's content, however long it takes. Also, more previously invisible pages are getting indexed because of manually-added links to them from visible pages. </li></ul>
  5. 5. Why “Invisibility”? <ul><li>Invisible does not mean that it is inaccessible. It means that it is not indexed by a search engine and is invisible to the person who is searching the net. If you find no results in the search engine that does not mean that the content is not available. </li></ul>
  6. 6. Reasons for Invisibility <ul><li>Dynamic URLs : If web pages have long string of parameters and equal signs and question marks, such that they get duplicated what is in their data base. </li></ul><ul><li>Form controlled Entry : Pages are displayed only when some actions are taken by human. </li></ul><ul><li>Hidden Pages :Hidden means there is simply no sequence of hyperlink clicks that could take you to such a page. The pages are accessible, but only people who know of their existence know how to view them. </li></ul><ul><li>New Pages: If the pages are new then they will not be indexed by any engine. </li></ul>
  7. 7. Reasons for Invisibility <ul><li>Flash Presentations: Text content in Flash presentations is not indexed. </li></ul><ul><li>Geo-Tagged: Computers from certain regions may be blocked out. That may include blocking of search engines also. Many of the American TV broadcasters are showing TV online, however, they are not available for searching. </li></ul><ul><li>There may be other reasons for not invisibility. </li></ul>
  8. 8. Some Ways to Make Invisible Content Visible by site owner <ul><li>There are ways to make deep web visible. We will study what a site owner could do and then what a searcher can do. Let us consider the what the site owner can do </li></ul><ul><li>Link it to a visible or indexed page . If some content is available to you , you may put it on a static HTML page, with relevant formatting and necessary hyperlinks, then link to this static page from an already &quot;visible&quot; (indexed) page. </li></ul>
  9. 9. <ul><li>Convert formats . For flash and other files transcribe them in to words and add as text. </li></ul><ul><li>For Audio :A udio content such as a pod cast may be transcribed and published as supplementary text. </li></ul>
  10. 10. <ul><li>Build links . Link to your own pages from other related pages. If you write about, say, trees on page A, then write about trees again on page B, link from page B to page A to give A more relevance. If page A hasn't been indexed, it will be after B is indexed. Points 6-9 are alternate ways to build links, hence helping make content visible. </li></ul>
  11. 11. <ul><li>Build a topic pyramid . This is a specialized form of sitemap that actually spans many pages. The apex (top-most) page has general topics and links to the next layer of pages, which have more specific topics and links to the next layer. The bottom-most layer of the topic pyramid are your original Web pages or blog posts, which have the most specific content. This method builds page relevance via the serial linking, which induces spiders to want to visit and index. </li></ul>
  12. 12. <ul><li>Socially bookmark it . If you find something, say a book at The Gutenberg Project, that you like, bookmark the URL at a social bookmaking site such as with a brief description. </li></ul><ul><li>Remove access restrictions . Get rid of the need to login, or don't apply time-limits. </li></ul>
  13. 13. What user can do <ul><li>Use a site's search engine . Some times the site search engine may provide better information. </li></ul><ul><li>Use site archive navigation . On web logs in particular, you can use the archive links to find info, albeit through manual searching. </li></ul>
  14. 14. What user can do <ul><li>Using the word &quot;database&quot; in regular search engine query will find information that is difficult to find. For example, if you are looking for a database of images, you can type the search string images database into Google or one of the other engines. Somewhere down the results list in Google, you'll find Full-Text Database Images from the USPTO (US Patent and Trademark Office). You can then use the Quick or Advanced search forms to find patents relating to one or more terms. If there are images to be seen, there will be links to them. </li></ul>
  15. 15. What user can do <ul><li>We can use an &quot;invisible Web&quot; directory, portal or specialized search engine such as Google Book Search , Google Scholar , Librarian's Internet Index , or BrightPlanet's Complete Planet (70,000 searchable databases and specialty search engines). </li></ul>
  16. 16. Invisible web search tools <ul><li>Deep Web Search Engine — Clusty . </li></ul><ul><li>Art — Musie du Louvre . </li></ul><ul><li>Books Online — The Online Books Page . </li></ul><ul><li>Business — Explorit Now! . </li></ul><ul><li>Consumer — US Consumer Products Safety Commission Recalled Products . </li></ul><ul><li>Economic and Job Data — — A searchable directory of free economic data. </li></ul><ul><li>Finance and Investing — . </li></ul>
  17. 17. Invisible web search tools <ul><li>General Research — GPO's Catalog of US Government Publications . </li></ul><ul><li>Government Data — Copyright Records (LOCIS) . </li></ul><ul><li>International — International Data Base (IDB) . </li></ul><ul><li>Law and Politics — THOMAS (Library of Congress) . </li></ul><ul><li>Library of Congress — Library of Congress . </li></ul><ul><li>Medical and Health — PubMed . </li></ul><ul><li>Science — . </li></ul><ul><li>Transportation — FAA Flight Delay Information . </li></ul>
  18. 18. Further research tools <ul><li>About WebSearch — Christmas 2006 web search guide . </li></ul><ul><li>About Websearch — The deep web — find out more about the deep web — deep web search . </li></ul><ul><li>ALA — American Library Association . </li></ul><ul><li>BrightPlanet — FAQ . </li></ul>
  19. 19. Further Research <ul><li>Deep Web Research — A gigantic list of resources. </li></ul><ul><li>Deep Web Technologies . </li></ul><ul><li>Ellipsis — Metadata, Google, and the Invisible Web . </li></ul><ul><li>Envisional . </li></ul>
  20. 20. Further Research <ul><li>Google Librarian Center . </li></ul><ul><li>Google Library Project . </li></ul><ul><li>Lifehacker — How to search the invisible web . </li></ul><ul><li>MediaBistro — Some resources for freelancers . </li></ul>
  21. 21. Further Research <ul><li>MetaQuerier — Exploring and integrating the deep web . </li></ul><ul><li>QProber — Classifying and searching hidden-web text databases . </li></ul><ul><li>The Invisible Web Weblog . </li></ul><ul><li>University of California, Berkeley — Invisible or deep web . </li></ul>
  22. 22. One of the most important site <ul><li>Please go through </li></ul><ul><li>This is an online education data base that will provide you information on various areas relating to career and education. </li></ul>