A
SEMINAR PRESENTATION
ON
“DEEP WEB DATA EXTRACTION ”
PRESENTED BY GUIDED BY
MR MANOJ PRASAD PROF. B. S. SURUSHE
Contents
1. Introduction
2. Literature survey
3 History of deep web
4 How search engine works
5 Deep web vs Surface web
6 Accessing the Deep Web
7 Advantages and disadvantages
7 Future Scope
8 Conclusion
Introduction
What is Deep Web ?
Literature survey
 The Tor Project, Inc. Tor Project. Last accessed on 11 June 2015, https://www.torproject.org/.
 Liu, Bing, Robert Grossman, and Yanhong Zhai. "Mining data records in web pages."
In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery
and data mining, pp. 601-606. ACM, 2003
 Grossman, L., Newton-Small, J.: The Deep Web. Time: The Secret Web, Where Drugs, Porn
 Deep Web Technologies: Federated Search and My Business. 15 Dec (2014)
http://www.deepwebtech.com/company/resource-center/faqs/#bottomline
 the invisible web. Facet publishing, London (2014)
History of Deep Web
 Jill Ellsworth used the term invisible Web in 1994 to refer to website were not registered with any
search engine
 The initial definition of deep web suggest that “surface Web content is persistent on static pages
discoverable by search engine through crawling”
 The Deep Web came into the eye if the Government or so called the authorities ion 2013 when the
silk route
 From that time the Government shows a great interest on deep web. Constantly keeping an eye on
the deep network
How Search Engine Works
Deep web vs Surface web
Accessing the Deep Web
Cont…
There are some special kind of links available called
as the onion, with an .onion extension
Some of the .onions links are
http://v6pgrjno6mzbjicf.onion - The Onion Press
http://am4wuhz3zifexz5u.onion - The TOR Library
http://zbnnr7qzaxlk5tms.onion - WikiLeaks mirror
http://kpvz7ki2v5agwt35.onion - Hidden Wiki
Cont
Advantages and disadvantages
Well as per advantages comes in the deep web
provides access to sites that have not been indexed by
the search engines, including the database entries .
It contain a wealth of valuable information,
An easy way to get a valuablable research materials
that are not available or
Future Scope
 With the advancement and amount of information available on the deep web more
and more users are migrating in using deep web .
 More and more users wants to be more secure and anonymous while they are
online.
 To gain the valuable information that are not easily available by the surface web.
Conclusions
 We’re looking at the end of the internet, as we know it
 It’s growing into a two tier-internet. One being a toll highway
 And another being a slow freeway. But we can do something
about it. We can all go Deep Web.
 It’s not a place . It’s a state of mind , a way of being ; being
anonymous and safe
References
 ulbr_mirror. Scribd. “Ulbricht Criminal Complaint.” Last accessed on 10 June 2015,
http://www.scribd.com/doc/172768269/Ulbricht-Criminal-Complaint.
 He, Bin, Mitesh Patel, Zhen Zhang, and Kevin Chen-Chuan Chang. "Accessing the deep
web." Communications of the ACM 50, no. 5 (2007): 94-101.
 Madhavan, Jayant, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy. "Google's
deep web crawl." Proceedings of the VLDB Endowment 1, no. 2 (2008): 1241-1252.
 C.-H. Chang, M. Kayed, M.R. Girgis, and K.F. Shaalan, “A Survey of Web Information Extraction Systems,” IEEE
Trans. Knowledge and Data Eng., vol. 18, no. 10, pp. 1411-1428, Oct. 2006
 Laender, B. Ribeiro-Neto, A. da Silva, and J. Teixeira, “A Brief Survey of Web Data Extraction Tools,” SIGMOD
Record, vol. 31,no. 2, pp. 84-93, 2002.
 Grossman, L., Newton-Small, J.: The Deep Web. Time: The Secret Web, Where Drugs, Porn
 and Murder Hide Online. Nov 11 (2013 Devine, J., Egger-Sider, F.: Going Beyond Google Again: Strategies
for Using and teaching
 Dingledine, R., Mathewson, N.: Tor: An anonymous internet communication system. In:Proceedings of
Workshop Vanishing Anonymity, the 15th Conference Computers, Freedom, and Privacy (2005)
 Deep Web Technologies: Federated Search and My Business. 15 Dec (2014)
http://www.deepwebtech.com/company/resource-center/faqs/#bottomline
Thank You

Deep web

  • 1.
    A SEMINAR PRESENTATION ON “DEEP WEBDATA EXTRACTION ” PRESENTED BY GUIDED BY MR MANOJ PRASAD PROF. B. S. SURUSHE
  • 2.
    Contents 1. Introduction 2. Literaturesurvey 3 History of deep web 4 How search engine works 5 Deep web vs Surface web 6 Accessing the Deep Web 7 Advantages and disadvantages 7 Future Scope 8 Conclusion
  • 3.
  • 4.
    Literature survey  TheTor Project, Inc. Tor Project. Last accessed on 11 June 2015, https://www.torproject.org/.  Liu, Bing, Robert Grossman, and Yanhong Zhai. "Mining data records in web pages." In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 601-606. ACM, 2003  Grossman, L., Newton-Small, J.: The Deep Web. Time: The Secret Web, Where Drugs, Porn  Deep Web Technologies: Federated Search and My Business. 15 Dec (2014) http://www.deepwebtech.com/company/resource-center/faqs/#bottomline  the invisible web. Facet publishing, London (2014)
  • 5.
    History of DeepWeb  Jill Ellsworth used the term invisible Web in 1994 to refer to website were not registered with any search engine  The initial definition of deep web suggest that “surface Web content is persistent on static pages discoverable by search engine through crawling”  The Deep Web came into the eye if the Government or so called the authorities ion 2013 when the silk route  From that time the Government shows a great interest on deep web. Constantly keeping an eye on the deep network
  • 6.
  • 7.
    Deep web vsSurface web
  • 8.
  • 9.
    Cont… There are somespecial kind of links available called as the onion, with an .onion extension Some of the .onions links are http://v6pgrjno6mzbjicf.onion - The Onion Press http://am4wuhz3zifexz5u.onion - The TOR Library http://zbnnr7qzaxlk5tms.onion - WikiLeaks mirror http://kpvz7ki2v5agwt35.onion - Hidden Wiki
  • 10.
  • 11.
    Advantages and disadvantages Wellas per advantages comes in the deep web provides access to sites that have not been indexed by the search engines, including the database entries . It contain a wealth of valuable information, An easy way to get a valuablable research materials that are not available or
  • 12.
    Future Scope  Withthe advancement and amount of information available on the deep web more and more users are migrating in using deep web .  More and more users wants to be more secure and anonymous while they are online.  To gain the valuable information that are not easily available by the surface web.
  • 13.
    Conclusions  We’re lookingat the end of the internet, as we know it  It’s growing into a two tier-internet. One being a toll highway  And another being a slow freeway. But we can do something about it. We can all go Deep Web.  It’s not a place . It’s a state of mind , a way of being ; being anonymous and safe
  • 14.
    References  ulbr_mirror. Scribd.“Ulbricht Criminal Complaint.” Last accessed on 10 June 2015, http://www.scribd.com/doc/172768269/Ulbricht-Criminal-Complaint.  He, Bin, Mitesh Patel, Zhen Zhang, and Kevin Chen-Chuan Chang. "Accessing the deep web." Communications of the ACM 50, no. 5 (2007): 94-101.  Madhavan, Jayant, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy. "Google's deep web crawl." Proceedings of the VLDB Endowment 1, no. 2 (2008): 1241-1252.  C.-H. Chang, M. Kayed, M.R. Girgis, and K.F. Shaalan, “A Survey of Web Information Extraction Systems,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 10, pp. 1411-1428, Oct. 2006  Laender, B. Ribeiro-Neto, A. da Silva, and J. Teixeira, “A Brief Survey of Web Data Extraction Tools,” SIGMOD Record, vol. 31,no. 2, pp. 84-93, 2002.  Grossman, L., Newton-Small, J.: The Deep Web. Time: The Secret Web, Where Drugs, Porn  and Murder Hide Online. Nov 11 (2013 Devine, J., Egger-Sider, F.: Going Beyond Google Again: Strategies for Using and teaching  Dingledine, R., Mathewson, N.: Tor: An anonymous internet communication system. In:Proceedings of Workshop Vanishing Anonymity, the 15th Conference Computers, Freedom, and Privacy (2005)  Deep Web Technologies: Federated Search and My Business. 15 Dec (2014) http://www.deepwebtech.com/company/resource-center/faqs/#bottomline
  • 15.