The document discusses the surface web and deep web. It defines the surface web as the portion of the world wide web indexed by conventional search engines. The deep web is much larger, containing over 500 times more information than the surface web. This includes dynamically generated websites that search engines cannot access. The deep web is also known as the invisible web or hidden web. It explains how search engines work and index databases, but cannot access information behind search forms in the deep web. It then discusses the Tor network and how it provides anonymity by routing traffic through multiple nodes. Some popular deep web search engines and sites are also mentioned like The Onion Router (Tor) network and darknet markets.
3. Surface Web
• The surfaceWeb is that portion of theWorldWideWeb.
• That is indexed by conventional search engines.
• It is also known as the Clearnet, theVisibleWeb or IndexableWeb.
4. Introduction
• Modern Internet: Most effective source of information.
• Most popular search engine: Google
• In 2008, Google addedTrillionth
(1012
) web link to their index
database!
• Stores several billion documents!
• Many times we are not satisfied with the search results.
• 43 % users reports it.
6. How search engines work
Spiders or web crawlers
Indexed database
• Search engines construct a database of theWeb by using programs called
spiders orWeb crawlers
• The spiders will collects informations, keywords, data, URLs from the web
and store it on to the search engine’s indexed database.
7.
8. Motivation:Why DeepWeb
• Then why Google fails?
• Most of theWeb's information is buried far down on dynamically
generated sites.
• Traditional web crawler cannot reach there.
• Large portion of data are literally ‘un-explored’
• Need for more specific information stored in databases
• Can only be obtained if we have access to the database
containing the information.
10. Deep Web - Introduction
• The DeepWeb isWorldWideWeb content that is not part of the
SurfaceWeb, which is indexed by standard search engines.
• It is also called the Deepweb, InvisibleWeb or HiddenWeb.
• Largest growing category of new information on the Internet.
• 400-550 * more public information than the SurfaceWeb.
• Total quality 1000-2000 * greater than the quality of the Surface
Web.
11. Evolution of DeepWeb
• Early Days: static html pages, crawlers can easily reach
• In mid-90’s: Introduction of dynamic pages, that are
generated as a result of a query.
• In 1994:The term “InvisibleWeb” used to refer to these
websites.
• In 2001, it is known as “DeepWeb”
12. Size the DeepWeb
• Size of surface web is around 19TB
• Size of DeepWeb is around 7500TB
• DeepWeb is nearly 400 times larger
than the SurfaceWeb
25. • Each node has a unique session key.
• 1st
, 2nd
and 3rd
node have its own IP Address.
• 2nd
node knows only the 1st
node’s IP Address
• 3rd
node knows only the 2st
node’s IP Address
• 3rd
node provide security by encrypting the data
• 3rs node has no idea about the traffic originated from.
30. • It is a search engine.
• Google of the deep web.
• They are underground in nature.
• It helps to find dark net websites.
• Its products
Helix
G Search
Tor Ads
Flow
31. Helix
• Helix is the definitive dark net bitcoin custodian.
G Search
Tor Ads
Flow
• Search the dark net by keyword, product, region.
• advertising network for both advertisers and publishers.
• It will help you to easily hidden sites without having to
remember the long and random onion address.
32.
33. BitCoins
Bitcoins are electronic currency (cryptocurrency).
String of code - ‘blockchain’
Making Bitcoins- Mining.
Bitcoin accounts cannot be frozen or examined by tax men.
Bitcoins are traded from one personal 'wallet' to another.
A wallet is a small personal database that you store on your computer
drive, on your smartphone, on your tablet, or somewhere in the cloud.
One bitcoin is currently worth around $450 US dollars ( Rs . 30834.70).
36. CONCLUSION
The Deep Web is the largest part of the internet, yet the majority
of the population does not even know about it, or even access it. It
can be used for good and for bad, legal and illegal activity. it is
important to understand that it is not all bad. There is plenty good
about the Deep Web, which includes the right of privacy when
surfing the Internet. The understanding of the Deep Web and its
capabilities is vital to the future of the Internet, and hopefully this
paper helps accomplish that goal.
Introduction:
Now any kind of information we refer to internet, which is considered to be the mose effective source of information. And the first thing that comes to our mind when we talk about internet search is Google. In 2008, Google added trillionth web link to their huge index database. Yet many times we feel that its not providing satisfying result. Infact a survey tells that almost 43% of the user are not completely satisfied with the search result.
Introduction:
Lets take a real life example.
I want to search “availability of seats in Mumbai Howrah AC Duronto Express on 6th Dec, 2011” and lets take a look what Googles providing
Then where is problem? What is reason Google failing to satisfy despite having such huge information storage?
Motivation:
This question motivate us to study the other side of the internet, the “deep web” or commonly called “invisible web”.
Most of the informations in the web are stored in some databases with limited interface, mainly interacted through html forms.
These pages are dynamically generated as a result of an query. These web contents are collectively called Deep Web.
Deep web is more structured and accurate & specific information can be retrieved from deep web.
This is why it is so important to explore deep web. Traditional web crawler cannot reach these pages and hence some other methods are adopted
Evolution:
Earlier all the information were stored in plain html pages which are static, and typical web crawler could easily cover the entire information.
However, in the mid 90s dynamic pages were introduced to cope up with the enormous increase in data. Information were being stored in databases which return results for as query. Such dynamically generated pages were beyond the coverage of traditional crawlers.
In 1994, Jill Ellworth termed this un-reachable web as “invisible-web”
In 2001, Bergman in his study renamed it as “Deep Web” and proposed BrightPlanet technology which is capable of finding deep web contents