SlideShare a Scribd company logo
1 of 8
Download to read offline
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



              Algorithm for Merging Search Interfaces over Hidden Web
                                              Harish Saini1, Kirti Nagpal2
                                  1
                                   Assistant Professor N.C.College of Engineering Israna, Panipat
                                   2
                                   Research Scholar, N.C.College of Engineering Israna, Panipat




Abstract: This is the world of information. The size of                same domain. Hidden web crawler is the crawler which
world wide web [4,5] is growing at an exponential rate day             continuously crawls hidden web so that it could be indexed by
by day. The information on the web is accessed through                 search engines. The major functions which are to be
search engine. These search engines [8] uses web crawlers to           performed by hidden web crawler are getting form filled,
prepare the repository and update that index at an regular             finding search query, submitting query and indexing the
interval. These web crawlers [3, 6] are the heart of search            results. The challenge of the hidden web crawlers are the huge
engines. Web crawlers continuously keep on crawling the web            and dynamic nature of the hidden web. The merging[46] or
and find any new web pages that have been added to the web,            integration of search interfaces over hidden web is required to
pages that have been removed from the web and reflect all              response queries so that it answers queries at its best. For the
these changes in the repository of the search engine so that the       integration of search interfaces over hidden web two steps are
search engines produce the most up to date results.                    required

Keywords: Hidden Web, Web Crawler, Ranking, Merging                             Finding semantic mapping over the different search
                                                                                 interfaces to be merged.
I. INTRODUCTION                                                                 Merging search interfaces on the basis of semantic
This is the world of information. The size of world wide web                     mappings found earlier.
[4,5] is growing at an exponential rate day by day. The
information on the web is accessed through search engine.
                                                                       A Ranking the Web Pages by Search Engine
These search engines [8] uses web crawlers to prepare the
                                                                       Search for anything using our favorite crawler-based search
repository and update that index at an regular interval. Web
                                                                       engine will sort through the millions of pages it knows about
crawlers continuously keep on crawling the web and find any
                                                                       and present you with ones that match your topic. The matches
new web pages that have been added to the web, pages that
                                                                       will even be ranked, so that the most relevant ones come first.
have been removed from the web and reflect all these changes
                                                                       Of course, the search engines don't always get it right. So,
in the repository of the search engine so that the search
                                                                       how do crawler-based search engines go about determining
engines produce the most up to date results. There is some
                                                                       relevancy, when confronted with hundreds of millions of web
data on the web that is hidden behind some query interfaces
                                                                       pages to sort through? They follow a set of rules, known as an
and that data can’t be accessed by these search engines. These
                                                                       algorithm. Exactly how a particular search engine's algorithm
kind of data is called hidden web [41] or deep web. These
                                                                       works is a closely kept trade secret.
days the contents on the hidden web are of higher interests to
the users because the contents of hidden web are of higher
quality and greater relevance. The contents on hidden web can          One of the main rules in a ranking algorithm involves the
be accessed by filling some search forms which are also                location and frequency of keywords on a web page. Call it the
called sometimes search interfaces. Search interfaces [37] are         location/frequency method. Pages with the search terms
the entry points to the hidden web and make it possible to             appearing in the HTML title tag are often assumed to be more
access the contents on the hidden web. These search                    relevant than others to the topic. Search engines will also
interfaces are simply like filling any forms like forms for            check to see if the search keywords appear near the top of a
creating e-mail ids etc. Each search interface has number of           web page, such as in the headline or in the first few
input controls like text boxes, selection lists, radio buttons etc..   paragraphs of text. They assume that any page relevant to the
To access hidden web one needs to fill these search interfaces         topic will mention those words right from the beginning.
for which a crawler is required which finds for forms on the           Frequency is the other major factor in how search engines
web known as form crawler. There can be multiple search                determine relevancy. A search engine will analyze how often
interfaces for the same information domain on the web. In that         keywords appear in relation to other words in a web page.
case all those search interfaces are required to be merged or          Those with a higher frequency are often deemed more
integrated so that crawler[3] finds all data relevant to the user      relevant than other web pages.
input despite of existence of multiple search interfaces for the

Issn 2250-3005(online)                                                         August| 2012                          Page 1137
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



B. Web Crawler                                                    automatically crawls hidden web so that it can be indexed by
A crawler [14,20] is a program that are typically programmed      search engines. Hidden web crawler is able to allow an
to visit sites that have been submitted by their owners as new    average use to explore the amount of information, which is
or updated. The contents of the Entire sites or specific pages    mostly hidden behind search interfaces. The other motive for
is selectively visited and indexed. The key reason of using       hidden web crawler is to make hidden web pages searchable
web crawlers is to visit web pages and add them to the            at a central location so that the significant amount of time and
repository so that a database can be prepared which in turn       effort wasted in searching the hidden web can be reduced.
serves applications like search engines.                          One more motive for hidden web crawlers is that due to heavy
                                                                  reliance of web users on search engines for locating
Crawlers use graph structure of the web to move from pages        information, search engine influence how the users perceive
to pages. The simplest architecture of web crawler is to start    the web..
from a seed web page and traverse all hyperlinks encountered
in this web pages then all encountered hyperlinks are added to    There are two core challenges while implementing an
the queue which in turn are traversed. The process is repeated    effective hidden web crawler
until a sufficient number of pages are identified...
                                                                          The crawler has to be able to understand and model a
The main goal of the web crawler is to keep the coverage and               query interface.
freshness of the search engine index as high as possible which            The crawler has to come up with meaningful queries
is not informed by user interaction. For this task the crawler             to issue to the query interface.
and other parts of the search engine have no communication
between them. Because web is a purely dynamic collection of       The first of the two challenges was addressed by a method for
web pages there is a need for crawling cycles frequently to       learning search interfaces. For the latter challenge if the
refresh the web repository so that all new pages that have        search interface is able to list all possible values for a query
been added to the web are included in the repository similarly    with help of some selection list or drop down list in that case
the web pages that have been removed from web are deleted         solution is straight forward. possible so all of the possible
from web repository                                               queries can’t be exhaustively listed.

C Hidden Web Crawler                                              D Search Interfaces
Current day crawlers crawls only publicly indexable web           While accessing hidden web [49] search query is submitted to
(PIW) i.e set of pages which are accessible by following          the search interface of the hidden web crawler. The selection
hyperlinks ignoring search pages and forms which require          of web pages and the relevance of results produced from
authorization or prior registration. In reality they may ignore   hidden web is determined by the effectiveness of search
huge amount of high quality data which is hidden behind           interfaces [42,43]. The different actions performed by search
search forms. Pages in hidden web are dynamically generated       interfaces are getting input from user, selection of query,
in response to the queries submitted via search forms.            submitting query to the hidden web and merging of results
Crawling the hidden web is highly challenging task because        produced by hidden web crawlers.
of scale and the need for crawlers to handle search interfaces
designed primarily for human beings. The other challenges of
                                                                  The most important activity of search interface is selection of
hidden web data are:
                                                                  query which may be implemented in variety of ways. One
      Ordinary web crawlers can’t be used for hidden web
                                                                  method is random method in which random keywords are
      The data in hidden web can be accessed only through        selected from the English dictionary and are submitted to the
         a search interface
                                                                  database. Random approach generates a reasonable number of
      Usually the underlying structure of the database is        relevant pages. The other technique is using generic frequency.
         unknown.                                                 In this method we find a document and obtain general
                                                                  distribution frequency of each word in the document. Based
The size of hidden web is continuously increasing as more         on this frequency distribution we find the most frequent
and more organizations are putting their high quality data        keyword, issue it to the hidden web database and retrieve the
online hidden behind search forms. Because there are no static    result. The same process is repeated with the second most
links to hidden web pages therefore search engines can’t          frequent word and so on till the downloaded resources are
discover and index these web pages.                               exhausted. After that it analyses the returned web pages that
                                                                  whether they contain results or not. Another method is
Because the only entry point to hidden web is search interface    adaptive in this method the documents returned from the
the main challenge is how to generate meaningful queries to       previous queries issued to the Hidden-Web database is
issue to the site. Hidden web crawler is the one which            analysed and it is estimated which keyword is most likely to
Issn 2250-3005(online)                                                   August| 2012                           Page 1138
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



return the most documents. Based on this analysis, the process      multiple integration techniques which integrates these search
is repeated with the most promising queries.                        forms over the hidden web. Most of those are semiautomatic.

E Problems in Integration of Hidden Web                              F. Need For Merging Search Interfaces
Recently keen interest has been developed in the retrieval and      It seems that there will always be more then one search
integration of hidden web with the intention of having high         interfaces even for the same domain. Therefore,
quality data for the databases. As the size of information in       coordination(i.e. mapping, alignment, merging) of search
hidden web is growing, search forms or search interfaces are        interfaces is a major challenge for bridging the gaps between
needed to be located which serves as an entry point to hidden       agents with different conceptualizations.
web database. But there exists crawlers which focus search on
specific database domains and retrieve invariable diversified       Two approaches are possible:
set of forms like login form, quote request forms and
searchable forms from multiple domains. The process of               (1) merging the search interfaces to create a single coherent
grouping these search interfaces or search forms over the           search interface
same database domain is called integration of search
interfaces.
                                                                     (2) aligning the search interfaces by establishing links
                                                                    between them and allowing them to reuse information from
These search form are input to algorithm which finds                one another.
correspondences among attributes of different forms. This
task is not so easy because of dynamic nature of the web the
new information is added to web and new information is              Here we propose a new method for search interface merging.
being modified or even removed. Therefore a scalable
solution suitable for large scale integration is required for the   Whenever multiple search forms are to be integrated the task
purpose of large scale integration. In addition in a well           can be divided into two major steps
defined domain there may be variation in both structure and
vocabulary of search forms. So in order to obtain information         Finding semantic mapping among search forms or
that covers the whole domain it is required to find                    interfaces
correspondences among attributes of forms and for that a              Merging search interfaces on the basis of mapping
broad search is needed to be performed.
                                                                    The first step in merging [46] of search forms or interfaces
For this either full crawl is needed to be performed but this is    are finding semantic mapping and correspondences among
really inefficient approach as it can take weeks for the            attributes of those forms so that on the basis of that mapping
exhaustive crawling process. Another alternative can be focus       search interfaces or forms could be merged. This semantic
crawling in which only pages relevant to a specific topic are       mapping serves as the domain specific knowledge base for the
crawled. This has a better quality index as compared to             process of merging [47] of search interfaces. The proposed
exhaustive crawling. In this case focused crawler that focus        algorithm is semi automatic. It finds similar terms from a
solely on the contents of retrieved web pages may not be a          look-up table which stores all similar terms in the different
very good alternative also because forms are sparsely               search interfaces to be merged along with the relationship
distributed and thus the number of forms retrieved per total        between those similar terms so that it may be decided which
number of pages retrieved may be very low. For tackling this        term will be appearing in the merged search interface. The
problem a focused crawler has been developed which used             algorithm makes use of search interfaces in form of a
reinforcement learning to build a focus crawler that is             taxonomy tree to cluster the concepts in the search interfaces
effective for sparse concepts.                                      on the basis of level
          The kind of web crawler which crawl web pages
which contain search forms are called form crawlers. Since          II Merging of Search Interfaces
form crawler find thousand of forms for the same domain.            More and more applications are in need to utilize multiple
Those different forms even for the same domain may be               heterogeneous search interfaces across various domains. To
different in the structure and vocabulary they use. For that        facilitate such applications it is urgent to reuse and
semantic mappings are required to be found among all                interoperate heterogeneous search interfaces. Both merge and
attributes of those forms and after finding semantic mappings       integration produces a static search interface based on the
those are integrated so that queries involving that domain can      existing search interfaces. The resulting search interface is
be answered suppressing the existence of multiple search            relatively hard to evolve with the change of existing search
forms using different structures and vocabulary. There exist        interfaces.

Issn 2250-3005(online)                                                    August| 2012                           Page 1139
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



         The search interface merge module takes source               A Merging Observations and Results
search interfaces as input and a table that lists all similar terms   Let us consider two source search interfaces and apply the
coming from different search interfaces along with the                above stated algorithm to merge those search interfaces as a
relationship between those similar terms. The similar terms           case study. The source search interfaces under consideration
can have relationship like meronym, hyper name and                    are given along with the domain specific knowledge base.
synonym. The search interface merge process is divided into
several steps.
Step 1. Find set of similar terms
Step 2. Children of similar terms are merged and point to the
same parent.
Table 1.1 Domain specific knowledge base
Concept        Concept                Relation
Optical_drive OpticalDisk             Synonym
HardDrive      HardDisk               Synonym
DVD            DVD-Rom                Synonym
CDRW           CD-RW                  Synonym
CDRom          CD-ROM                 Synonym
Computer       Personal Computer Synonym
The taxonomy trees of search interfaces A and B respectively are as following




                                          Fig1.1TaxonomytreeforSearchInterfaceS1
The codes assigned to the terms in the source search interface       Hard Disk                          A02
are as follows. Let us consider search interface a first and see     Personal computer                  A03
the codes assigned                                                   Memory                             A04
                                                                              Architecture              A05
                                                                              Optical disk              A06
Table 1.2 Codes of terms in search interface A
                                                                              Operating System          A07
         Term                          Code
                                                                              IBM                       A0101
         Recurso                       A
                                                                              Intel                     A0102
         CPU                           A01
Issn 2250-3005(online)                                                      August| 2012                          Page 1140
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



        AMD                       A0103                                PIV                        A010202
        X86                       A0501                                PIII                       A010203
        MAC                       A0502                                PM                         A010204
        Sunsparc                  A0503                                Xeon                       A010205
        DVD-Rom                   A0601                                Pentium_D                  A010206
        CD-RW                     A0602                                Itanium                    A010207
        CD-Rom                    A0603                                Opetron                    A010301
        Macos                     A0701                                Duron                      A010302
        Solaris                   A0702                                Athlen 64                  A010303
        Windows                   A0703                                Athlen 32                  A010304
        Unix                      A0704                                Semaron                    A010305
        PowerPC                   A010101                              Linux                      A070401
        Celeron                   A010201




                                     Fig 1.2 Taxonomy tree for Search Interface S2

Similarly the second search interface O2 under consideration           Combo-drive              B010203
becomes as follows                                                     CDRW                     B010204
Table 1.3 Codes of terms in search interface B                         CDROM                    B010205
        Term                      Code                                 Intel                    B010301
        Recurso                   B                                    Amd                      B010302
        Computer part             B01                                  RAM                      B010401
        Computer                  B02                                  Rpm5400                  B010501
        Modem                     B0101                                Rpm7200                  B010502
        Optical_drive             B0102                                CRT                      B010601
        CPU                       B0103                                LCD                      B010602
        Memory                    B0104                                Laser                    B010701
        Hard Drive                B0105                                Inkjet                   B010702
        Monitor                   B0106                                Dotmatrix                B010703
        Printer                   B0107                                Thermal                  B010704
        Cable Modem               B010101                              Pentium                  B01030101
        Internal 56K              B010102                              Celeron                  B01030102
        DVD                       B010201                              Sdram                    B01040101
        DVD RW                    B010202                              DDRam                    B01040102
Issn 2250-3005(online)                                               August| 2012                           Page 1141
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



 Now depending upon the length of codes assigned to each           checked for the similarity. We take example of term Intel in
 term search interface can be divided into clusters so that all    search interface o1 and the same term Intel in search interface
 terms having codes of same length reside in the same cluster.     O2. Then we try to combine the children of both terms. This
 The methodology used for clustering has been defined earlier.     term has been selected for the purpose of clarity and
 The clusters becomes as follows for each search interface.        simplicity only so that it doesn’t become too complex.

 The algorithm suggested above makes use of recursive calls
 so that all children of the terms under consideration are also

                                                                                                            B01
                                                                            B                               B02
                                   A01, A02,
    A                              A03, A04,                              Level 0                           Level 1
                                   A05, A06,
                                   A07

 Level 0                             Level 1
                                                                                                      B010101,B0101
                                                                      B0101,B0102,
                                                                                                      02,B010201,B0
                                                                      B0103,B0104,
                                                                                                      10202,B010203,
                                                                      B0105,B0106,
                                                                                                      B010204,B0102
                                                                      B0107
A0101,A0102                                                                                           05,B010301,B0
     Level 0                      A010101,A0102
,A0103,A050                                                                                           10302,B010404,
                                  01,A010202,A0
1,A0502,A05                                                                                           B010505,B0105
                                  10203,A010204,
03,A0601,A0                                                                                           02,B010601,B0
                                  A010205,A0102
         Level 2
602,A0603,A 3                                                                                         10602,B010701,
         Level                    06,A010207,A0
0701,A0702,                                                                                           B010702,B0107
                                  10301,A010302,
                                                                                                      03,B010704,
A0703,A0704                       A010303,A0103
                                  04,A010305,A0                                 B01030101,B01030102
                                  70401                                         B01040101,B01040102


 Level 2                             Level 3

                                                                                       Level 4
                                                                        Fig 1.4Clusters for Search Interface S2
 Fig 1.3 Clusters for Search Interface S1
 To see how the process of merging of search interfaces will       codes A01 and B0103 will be passed to function merge.
 take place we consider example of term CPU which appears          Function merge will make two queues from these two codes
 in both search interfaces. The process of merging will be         one for each code having all its descendants upto any level.
 followed as per the algorithm suggested above.First of all        So that those queues can be passed to function combine and
 both search interfaces O1 and O2 will be passed to function       queues can be merged. In function merge q1 and q2 will be
 Integrate. The two tables table 1 and table 2 are already         created having code A01 and B0103 in q1 and q2 respectively.
 available with us as given above. While scanning through          The color will be assigned to each term in the queue just to
 these tables as given in the algorithm it will be detected that   check whether immediate children of the term have been
 cpu appearing in table 1 is same as cpu appearing in table 2      added to the queue or not. Initially term will be of white color
 and whenever two same or similar terms are encountered the        and the color will be changed to black when its children has
 codes of both terms are fetched. From tables the fetched codes    been added to the queue. The process will be repeated until all
 of both terms in different search interfaces are A01 and          terms in the queue are black which represents that children of
 B0103 in search interface 1 and 2 respectively. These two         each term has been added to the queue.

 Issn 2250-3005(online)                                                  August| 2012                             Page 1142
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4




                           Fig 1.5 Taxonomy tree for Search Interface after merging S1 and S2

This way merging process proceeds in the algorithm. The same process will be repeated for any term in the search interface.
And after merging two search interfaces result will be stored in form of a queue only. The result obtained after merging the two
search interfaces above the resulting search interface becomes as follows. The case study has shown that how the proposed
algorithm is implemented for merging search interfaces. It is seen from the Fig 3.6 that all similar terms appear only once in the
resulting search interface and none of the concept is left. The same algorithm can be implemented for merging more than two
search interfaces.

REFERENCES                                                        [5]  Douglas E. Comer, “The Internet Book”, Prentice Hall
[1]   Brian Pinkerton, “Finding what people want:                      of India, New Delhi, 2001.
      Experiences with the web crawler.”   Proceedings Of         [6] A. K. Sharma, J. P. Gupta, D. P. Agarwal, “A novel
      WWW conf., 1994.                                                 approach towards efficient Volatile Information
[2]   Alexandros Ntoulas, Junghoo Cho, Christopher Olston              Management”, Proc. Of National Conference on
      "What's New on the Web? The Evolution of the Web                 Quantum Computing, 5-6 Oct.’ 2002, Gwalior.
      from a Search Engine Perspective." In Proceedings of        [7] AltaVista, “AltaVista Search Engine,” WWW,
      the World-Wide Web Conference (WWW), May 2004.                   http://www.altavista.com.
[3]   Vladislav Shkapenyuk and Torsten Suel, “Design and          [8] Shaw Green, Leon Hurst , Brenda Nangle , Dr. Pádraig
      Implementation of a High performance Distributed Web             Cunningham, Fergal Somers, Dr. Richard Evans,
      Crawler”, Technical Report, Department of Computer               “Sofware Agents : A Review”, May 1997.
      and Information Science, Polytechnic University,            [9] Cho, J., Garcia-Molina, H., Page, L., “Efficient
      Brooklyn, July 2001.                                             Crawling Through URL Ordering,” Computer Science
[4]   Junghoo Cho, Narayanan Shivakumar, Hector Garcia-                Department, Stanford University, Stanford, CA, USA,
      Molina "Finding replicated Web collections." In                  1997.
      Proceedings of 2000 ACM International Conference on         [10] O. Kaljuvee, O. Buyukkokten, H. Garcia-Molina, and A.
      Management of Data (SIGMOD), May 2000.                           Paepcke. “Efficient web form entry on pdas”. Proc. of
                                                                       the 10th Intl. WWW Conf., May 2001.

Issn 2250-3005(online)                                                   August| 2012                            Page 1143
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4



[11]   Ching-yao wang, ying-chieh lei, pei-chi chang and       [32] Natalya Fridman Noy and Mark A. Musen, “PROMPT:
       shian- shyong tsen National chiao-tung university, “A        Algorithm and tool for automated ontology merging and
       level wise clustering algorithm on structured                alignment”
       documents”                                              [33] Dinesh Sharma, Komal Kumar Bhatia, A.K Sharma,
[12]   Benjamin Chin Ming Fung, “Hierarchical document              “Crawling the Hidden web resources”, NCIT-07 , Delhi
       clustering using frequent itemsets”                     [34] Dinesh Sharma, “Crawling the hidden web resources
[13]    Francis Crimmins, “Web Crawler Review”.                     using agent” ETCC-07. NIT- Hamirpur
[14]   A.K.Sharma, J.P.Gupta, D.P.Agarwal, “An Alternative
       Scheme for Generating Fingerprints of Static
       Documents”, accepted for publication in Journal of
       CSI..
[15]   Allan Heydon, Marc Najork, 1999,“ Mercator: “A
       Scalable, Extensible Web Crawler”, Proceedings of the
       Eighth International World Wide Web Conference,
       219-229, Toronto, Canada, 1999.
[16]   F.C. Cheong, “Internet Agents: Spiders, Wanderers,
       Brokers and Bots ”, New Riders Publishing,
       Indianapolis, Indiana, USA, 1996.
[17]   Frank M. Shipman III, Haowei Hsieh, J. Michael
       Moore, Anna Zacchi , ”Supporting Personal
       Collections across Digital Libraries in Spatial
       Hypertext”
[18]   Stephen Davies, Serdar Badem, Michael D. Williams,
       Roger King, “Google by Reformulation”: Modeling
       Search as Successive Refinement
[19]   Gautam Pant, Padmini Srinivasan1, and Filippo
       Menczer, “Crawling the Web”
[20]   Monica Peshave ,”How Search engines work and a
       web crawler application”
[21]   Ben Shneiderman ,” A Framework for Search
       Interfaces”
[22]   Danny C.C. POO Teck Kang TOH, “ Search Interface
       for Z39.50 Compliant Online Catalogs Over The
       Internet”
[23]   David Hawking CSIRO ICT Centre, “Web Search
       Engines: Part 1”
[24]   M.D. Lee, G.M. Roberts, F. Sollich, and C.J. Woodru,
       “Towards Intelligent Search Interfaces”
[25]   Sriram Raghavan Hector Garcia-Molina, “Crawling the
       Hidden Web”
[26]   Jared Cope, “Automated Discovery of Search
       Interfaces on the Web”
[27]   IVataru Sunayama' Yukio Osan-a? and Iviasahiko
       Yachida', “Search Interface for Query Restructuring
       with Discovering User Interest”
[28]   Stephen W. Liddle, Sai Ho Yau, and David W.
       Embley, “On the automatic Extraction of Data from
       the Hidden Web”
[29]   Gualglei song, Yu Qian, Ying Liu and Kang Zhang,
       “Oasis: a Mapping and Integrated framework for
       Biomedical ontologies”
[30]   Miyoung cho, Hanil Kim and Pankoo Kim, “ A new
       method for ontology merging based on the concept
       using wordnet”
[31]   Pascal Hitzler, Markus Krotzsch, Marc Ehrig and York
       Sure, “ What is ontology merging”
Issn 2250-3005(online)                                               August| 2012                          Page 1144

More Related Content

What's hot

Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
 
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...Rana Jayant
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET Journal
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine OptimizationArun Kumar
 
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep WebSmart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep WebS Sai Karthik
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşAykut Aslantaş
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...CloudTechnologies
 
Colloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerColloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerAkshay Pratap Singh
 
A synonym based approach of data mining in SEO
A synonym based approach of data mining in SEOA synonym based approach of data mining in SEO
A synonym based approach of data mining in SEOhussein khateb
 
Focused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domainsFocused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domainseSAT Publishing House
 

What's hot (17)

Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
Smart Crawler Base Paper A two stage crawler for efficiently harvesting deep-...
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
SEO Interview FAQ
SEO Interview FAQSEO Interview FAQ
SEO Interview FAQ
 
Smart crawler a two stage crawler
Smart crawler a two stage crawlerSmart crawler a two stage crawler
Smart crawler a two stage crawler
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep WebSmart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
 
Search engine
Search engineSearch engine
Search engine
 
Search Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut AslantaşSearch Engine Optimization - Aykut Aslantaş
Search Engine Optimization - Aykut Aslantaş
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
 
Web crawling
Web crawlingWeb crawling
Web crawling
 
Colloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerColloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web Crawler
 
A synonym based approach of data mining in SEO
A synonym based approach of data mining in SEOA synonym based approach of data mining in SEO
A synonym based approach of data mining in SEO
 
Twet
TwetTwet
Twet
 
Focused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domainsFocused web crawling using named entity recognition for narrow domains
Focused web crawling using named entity recognition for narrow domains
 

Viewers also liked

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
DSS eCurriculum Tour 2013 - Booklet
DSS eCurriculum Tour 2013 - BookletDSS eCurriculum Tour 2013 - Booklet
DSS eCurriculum Tour 2013 - BookletJeff Ng
 
The pronunciation of north american english
The pronunciation of north american englishThe pronunciation of north american english
The pronunciation of north american englishDiego M.
 
Kenya’s economy slideshow
Kenya’s economy slideshowKenya’s economy slideshow
Kenya’s economy slideshowblanchardschool
 
CA Transcontinental Learning
CA Transcontinental LearningCA Transcontinental Learning
CA Transcontinental Learningsandralemmon
 
Au 2012 presentation_3_d printing
Au 2012 presentation_3_d printingAu 2012 presentation_3_d printing
Au 2012 presentation_3_d printinghkstudio
 
Lopatcong 2011-2012 Budget
Lopatcong 2011-2012 BudgetLopatcong 2011-2012 Budget
Lopatcong 2011-2012 BudgetMatt Shea
 
09-10_Budget_Preso_032309updt2
09-10_Budget_Preso_032309updt209-10_Budget_Preso_032309updt2
09-10_Budget_Preso_032309updt2Corinne Carriero
 
DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011
DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011
DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011Andris Soroka
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC
2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC
2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMCBob Hastings Buick GMC
 

Viewers also liked (20)

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
DSS eCurriculum Tour 2013 - Booklet
DSS eCurriculum Tour 2013 - BookletDSS eCurriculum Tour 2013 - Booklet
DSS eCurriculum Tour 2013 - Booklet
 
Mm3422322236
Mm3422322236Mm3422322236
Mm3422322236
 
Bab3
Bab3Bab3
Bab3
 
The pronunciation of north american english
The pronunciation of north american englishThe pronunciation of north american english
The pronunciation of north american english
 
Kenya’s economy slideshow
Kenya’s economy slideshowKenya’s economy slideshow
Kenya’s economy slideshow
 
CA Transcontinental Learning
CA Transcontinental LearningCA Transcontinental Learning
CA Transcontinental Learning
 
Varun Unit1
Varun Unit1Varun Unit1
Varun Unit1
 
Au 2012 presentation_3_d printing
Au 2012 presentation_3_d printingAu 2012 presentation_3_d printing
Au 2012 presentation_3_d printing
 
Lopatcong 2011-2012 Budget
Lopatcong 2011-2012 BudgetLopatcong 2011-2012 Budget
Lopatcong 2011-2012 Budget
 
590 599
590 599590 599
590 599
 
09-10_Budget_Preso_032309updt2
09-10_Budget_Preso_032309updt209-10_Budget_Preso_032309updt2
09-10_Budget_Preso_032309updt2
 
DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011
DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011
DSS - ITSEC conf - Arcot - Security for eCommerce - Riga Nov2011
 
Enterprise Europe Network Presentation - BS 10.12.10
Enterprise Europe Network Presentation - BS 10.12.10Enterprise Europe Network Presentation - BS 10.12.10
Enterprise Europe Network Presentation - BS 10.12.10
 
Prezi intro
Prezi introPrezi intro
Prezi intro
 
sukses bekerja
sukses bekerjasukses bekerja
sukses bekerja
 
2011 Buick Enclave Brochure Rochester
2011 Buick Enclave Brochure Rochester2011 Buick Enclave Brochure Rochester
2011 Buick Enclave Brochure Rochester
 
Ten Great Ideas Summer 2013
Ten Great Ideas Summer 2013Ten Great Ideas Summer 2013
Ten Great Ideas Summer 2013
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC
2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC
2012 GMC Yukon Brochure Rochester-Bob Hastings Buick GMC
 

Similar to IJCER (www.ijceronline.com) International Journal of computational Engineering research

[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the webVan-Duyet Le
 
Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismUmang MIshra
 
Design Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewDesign Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewIOSR Journals
 
Notes for
Notes forNotes for
Notes for9pallen
 
HIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesHIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesijdkp
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 
Multi Agent Based Web Mining Model.pptx
Multi Agent Based Web Mining Model.pptxMulti Agent Based Web Mining Model.pptx
Multi Agent Based Web Mining Model.pptxAeshwaryaChauhan1
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
Working of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th semWorking of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th semROHIT SAHU
 
Search Engine Marketing | Top Search Engines | Search Engines List
Search Engine Marketing | Top Search Engines | Search Engines ListSearch Engine Marketing | Top Search Engines | Search Engines List
Search Engine Marketing | Top Search Engines | Search Engines Listpaulfrench999
 
TEXT ANALYZER
TEXT ANALYZER TEXT ANALYZER
TEXT ANALYZER ijcseit
 

Similar to IJCER (www.ijceronline.com) International Journal of computational Engineering research (20)

E3602042044
E3602042044E3602042044
E3602042044
 
G017254554
G017254554G017254554
G017254554
 
E017624043
E017624043E017624043
E017624043
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web
 
Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanism
 
Design Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewDesign Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A Review
 
Notes for
Notes forNotes for
Notes for
 
Seminar on crawler
Seminar on crawlerSeminar on crawler
Seminar on crawler
 
HIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesHIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPages
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
DWM-MODULE 6.pdf
DWM-MODULE 6.pdfDWM-MODULE 6.pdf
DWM-MODULE 6.pdf
 
Multi Agent Based Web Mining Model.pptx
Multi Agent Based Web Mining Model.pptxMulti Agent Based Web Mining Model.pptx
Multi Agent Based Web Mining Model.pptx
 
L017447590
L017447590L017447590
L017447590
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Seo
SeoSeo
Seo
 
Working of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th semWorking of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th sem
 
Search Engine Marketing | Top Search Engines | Search Engines List
Search Engine Marketing | Top Search Engines | Search Engines ListSearch Engine Marketing | Top Search Engines | Search Engines List
Search Engine Marketing | Top Search Engines | Search Engines List
 
TEXT ANALYZER
TEXT ANALYZER TEXT ANALYZER
TEXT ANALYZER
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

IJCER (www.ijceronline.com) International Journal of computational Engineering research

  • 1. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 Algorithm for Merging Search Interfaces over Hidden Web Harish Saini1, Kirti Nagpal2 1 Assistant Professor N.C.College of Engineering Israna, Panipat 2 Research Scholar, N.C.College of Engineering Israna, Panipat Abstract: This is the world of information. The size of same domain. Hidden web crawler is the crawler which world wide web [4,5] is growing at an exponential rate day continuously crawls hidden web so that it could be indexed by by day. The information on the web is accessed through search engines. The major functions which are to be search engine. These search engines [8] uses web crawlers to performed by hidden web crawler are getting form filled, prepare the repository and update that index at an regular finding search query, submitting query and indexing the interval. These web crawlers [3, 6] are the heart of search results. The challenge of the hidden web crawlers are the huge engines. Web crawlers continuously keep on crawling the web and dynamic nature of the hidden web. The merging[46] or and find any new web pages that have been added to the web, integration of search interfaces over hidden web is required to pages that have been removed from the web and reflect all response queries so that it answers queries at its best. For the these changes in the repository of the search engine so that the integration of search interfaces over hidden web two steps are search engines produce the most up to date results. required Keywords: Hidden Web, Web Crawler, Ranking, Merging  Finding semantic mapping over the different search interfaces to be merged. I. INTRODUCTION  Merging search interfaces on the basis of semantic This is the world of information. The size of world wide web mappings found earlier. [4,5] is growing at an exponential rate day by day. The information on the web is accessed through search engine. A Ranking the Web Pages by Search Engine These search engines [8] uses web crawlers to prepare the Search for anything using our favorite crawler-based search repository and update that index at an regular interval. Web engine will sort through the millions of pages it knows about crawlers continuously keep on crawling the web and find any and present you with ones that match your topic. The matches new web pages that have been added to the web, pages that will even be ranked, so that the most relevant ones come first. have been removed from the web and reflect all these changes Of course, the search engines don't always get it right. So, in the repository of the search engine so that the search how do crawler-based search engines go about determining engines produce the most up to date results. There is some relevancy, when confronted with hundreds of millions of web data on the web that is hidden behind some query interfaces pages to sort through? They follow a set of rules, known as an and that data can’t be accessed by these search engines. These algorithm. Exactly how a particular search engine's algorithm kind of data is called hidden web [41] or deep web. These works is a closely kept trade secret. days the contents on the hidden web are of higher interests to the users because the contents of hidden web are of higher quality and greater relevance. The contents on hidden web can One of the main rules in a ranking algorithm involves the be accessed by filling some search forms which are also location and frequency of keywords on a web page. Call it the called sometimes search interfaces. Search interfaces [37] are location/frequency method. Pages with the search terms the entry points to the hidden web and make it possible to appearing in the HTML title tag are often assumed to be more access the contents on the hidden web. These search relevant than others to the topic. Search engines will also interfaces are simply like filling any forms like forms for check to see if the search keywords appear near the top of a creating e-mail ids etc. Each search interface has number of web page, such as in the headline or in the first few input controls like text boxes, selection lists, radio buttons etc.. paragraphs of text. They assume that any page relevant to the To access hidden web one needs to fill these search interfaces topic will mention those words right from the beginning. for which a crawler is required which finds for forms on the Frequency is the other major factor in how search engines web known as form crawler. There can be multiple search determine relevancy. A search engine will analyze how often interfaces for the same information domain on the web. In that keywords appear in relation to other words in a web page. case all those search interfaces are required to be merged or Those with a higher frequency are often deemed more integrated so that crawler[3] finds all data relevant to the user relevant than other web pages. input despite of existence of multiple search interfaces for the Issn 2250-3005(online) August| 2012 Page 1137
  • 2. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 B. Web Crawler automatically crawls hidden web so that it can be indexed by A crawler [14,20] is a program that are typically programmed search engines. Hidden web crawler is able to allow an to visit sites that have been submitted by their owners as new average use to explore the amount of information, which is or updated. The contents of the Entire sites or specific pages mostly hidden behind search interfaces. The other motive for is selectively visited and indexed. The key reason of using hidden web crawler is to make hidden web pages searchable web crawlers is to visit web pages and add them to the at a central location so that the significant amount of time and repository so that a database can be prepared which in turn effort wasted in searching the hidden web can be reduced. serves applications like search engines. One more motive for hidden web crawlers is that due to heavy reliance of web users on search engines for locating Crawlers use graph structure of the web to move from pages information, search engine influence how the users perceive to pages. The simplest architecture of web crawler is to start the web.. from a seed web page and traverse all hyperlinks encountered in this web pages then all encountered hyperlinks are added to There are two core challenges while implementing an the queue which in turn are traversed. The process is repeated effective hidden web crawler until a sufficient number of pages are identified...  The crawler has to be able to understand and model a The main goal of the web crawler is to keep the coverage and query interface. freshness of the search engine index as high as possible which  The crawler has to come up with meaningful queries is not informed by user interaction. For this task the crawler to issue to the query interface. and other parts of the search engine have no communication between them. Because web is a purely dynamic collection of The first of the two challenges was addressed by a method for web pages there is a need for crawling cycles frequently to learning search interfaces. For the latter challenge if the refresh the web repository so that all new pages that have search interface is able to list all possible values for a query been added to the web are included in the repository similarly with help of some selection list or drop down list in that case the web pages that have been removed from web are deleted solution is straight forward. possible so all of the possible from web repository queries can’t be exhaustively listed. C Hidden Web Crawler D Search Interfaces Current day crawlers crawls only publicly indexable web While accessing hidden web [49] search query is submitted to (PIW) i.e set of pages which are accessible by following the search interface of the hidden web crawler. The selection hyperlinks ignoring search pages and forms which require of web pages and the relevance of results produced from authorization or prior registration. In reality they may ignore hidden web is determined by the effectiveness of search huge amount of high quality data which is hidden behind interfaces [42,43]. The different actions performed by search search forms. Pages in hidden web are dynamically generated interfaces are getting input from user, selection of query, in response to the queries submitted via search forms. submitting query to the hidden web and merging of results Crawling the hidden web is highly challenging task because produced by hidden web crawlers. of scale and the need for crawlers to handle search interfaces designed primarily for human beings. The other challenges of The most important activity of search interface is selection of hidden web data are: query which may be implemented in variety of ways. One  Ordinary web crawlers can’t be used for hidden web method is random method in which random keywords are  The data in hidden web can be accessed only through selected from the English dictionary and are submitted to the a search interface database. Random approach generates a reasonable number of  Usually the underlying structure of the database is relevant pages. The other technique is using generic frequency. unknown. In this method we find a document and obtain general distribution frequency of each word in the document. Based The size of hidden web is continuously increasing as more on this frequency distribution we find the most frequent and more organizations are putting their high quality data keyword, issue it to the hidden web database and retrieve the online hidden behind search forms. Because there are no static result. The same process is repeated with the second most links to hidden web pages therefore search engines can’t frequent word and so on till the downloaded resources are discover and index these web pages. exhausted. After that it analyses the returned web pages that whether they contain results or not. Another method is Because the only entry point to hidden web is search interface adaptive in this method the documents returned from the the main challenge is how to generate meaningful queries to previous queries issued to the Hidden-Web database is issue to the site. Hidden web crawler is the one which analysed and it is estimated which keyword is most likely to Issn 2250-3005(online) August| 2012 Page 1138
  • 3. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 return the most documents. Based on this analysis, the process multiple integration techniques which integrates these search is repeated with the most promising queries. forms over the hidden web. Most of those are semiautomatic. E Problems in Integration of Hidden Web F. Need For Merging Search Interfaces Recently keen interest has been developed in the retrieval and It seems that there will always be more then one search integration of hidden web with the intention of having high interfaces even for the same domain. Therefore, quality data for the databases. As the size of information in coordination(i.e. mapping, alignment, merging) of search hidden web is growing, search forms or search interfaces are interfaces is a major challenge for bridging the gaps between needed to be located which serves as an entry point to hidden agents with different conceptualizations. web database. But there exists crawlers which focus search on specific database domains and retrieve invariable diversified Two approaches are possible: set of forms like login form, quote request forms and searchable forms from multiple domains. The process of (1) merging the search interfaces to create a single coherent grouping these search interfaces or search forms over the search interface same database domain is called integration of search interfaces. (2) aligning the search interfaces by establishing links between them and allowing them to reuse information from These search form are input to algorithm which finds one another. correspondences among attributes of different forms. This task is not so easy because of dynamic nature of the web the new information is added to web and new information is Here we propose a new method for search interface merging. being modified or even removed. Therefore a scalable solution suitable for large scale integration is required for the Whenever multiple search forms are to be integrated the task purpose of large scale integration. In addition in a well can be divided into two major steps defined domain there may be variation in both structure and vocabulary of search forms. So in order to obtain information  Finding semantic mapping among search forms or that covers the whole domain it is required to find interfaces correspondences among attributes of forms and for that a  Merging search interfaces on the basis of mapping broad search is needed to be performed. The first step in merging [46] of search forms or interfaces For this either full crawl is needed to be performed but this is are finding semantic mapping and correspondences among really inefficient approach as it can take weeks for the attributes of those forms so that on the basis of that mapping exhaustive crawling process. Another alternative can be focus search interfaces or forms could be merged. This semantic crawling in which only pages relevant to a specific topic are mapping serves as the domain specific knowledge base for the crawled. This has a better quality index as compared to process of merging [47] of search interfaces. The proposed exhaustive crawling. In this case focused crawler that focus algorithm is semi automatic. It finds similar terms from a solely on the contents of retrieved web pages may not be a look-up table which stores all similar terms in the different very good alternative also because forms are sparsely search interfaces to be merged along with the relationship distributed and thus the number of forms retrieved per total between those similar terms so that it may be decided which number of pages retrieved may be very low. For tackling this term will be appearing in the merged search interface. The problem a focused crawler has been developed which used algorithm makes use of search interfaces in form of a reinforcement learning to build a focus crawler that is taxonomy tree to cluster the concepts in the search interfaces effective for sparse concepts. on the basis of level The kind of web crawler which crawl web pages which contain search forms are called form crawlers. Since II Merging of Search Interfaces form crawler find thousand of forms for the same domain. More and more applications are in need to utilize multiple Those different forms even for the same domain may be heterogeneous search interfaces across various domains. To different in the structure and vocabulary they use. For that facilitate such applications it is urgent to reuse and semantic mappings are required to be found among all interoperate heterogeneous search interfaces. Both merge and attributes of those forms and after finding semantic mappings integration produces a static search interface based on the those are integrated so that queries involving that domain can existing search interfaces. The resulting search interface is be answered suppressing the existence of multiple search relatively hard to evolve with the change of existing search forms using different structures and vocabulary. There exist interfaces. Issn 2250-3005(online) August| 2012 Page 1139
  • 4. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 The search interface merge module takes source A Merging Observations and Results search interfaces as input and a table that lists all similar terms Let us consider two source search interfaces and apply the coming from different search interfaces along with the above stated algorithm to merge those search interfaces as a relationship between those similar terms. The similar terms case study. The source search interfaces under consideration can have relationship like meronym, hyper name and are given along with the domain specific knowledge base. synonym. The search interface merge process is divided into several steps. Step 1. Find set of similar terms Step 2. Children of similar terms are merged and point to the same parent. Table 1.1 Domain specific knowledge base Concept Concept Relation Optical_drive OpticalDisk Synonym HardDrive HardDisk Synonym DVD DVD-Rom Synonym CDRW CD-RW Synonym CDRom CD-ROM Synonym Computer Personal Computer Synonym The taxonomy trees of search interfaces A and B respectively are as following Fig1.1TaxonomytreeforSearchInterfaceS1 The codes assigned to the terms in the source search interface Hard Disk A02 are as follows. Let us consider search interface a first and see Personal computer A03 the codes assigned Memory A04 Architecture A05 Optical disk A06 Table 1.2 Codes of terms in search interface A Operating System A07 Term Code IBM A0101 Recurso A Intel A0102 CPU A01 Issn 2250-3005(online) August| 2012 Page 1140
  • 5. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 AMD A0103 PIV A010202 X86 A0501 PIII A010203 MAC A0502 PM A010204 Sunsparc A0503 Xeon A010205 DVD-Rom A0601 Pentium_D A010206 CD-RW A0602 Itanium A010207 CD-Rom A0603 Opetron A010301 Macos A0701 Duron A010302 Solaris A0702 Athlen 64 A010303 Windows A0703 Athlen 32 A010304 Unix A0704 Semaron A010305 PowerPC A010101 Linux A070401 Celeron A010201 Fig 1.2 Taxonomy tree for Search Interface S2 Similarly the second search interface O2 under consideration Combo-drive B010203 becomes as follows CDRW B010204 Table 1.3 Codes of terms in search interface B CDROM B010205 Term Code Intel B010301 Recurso B Amd B010302 Computer part B01 RAM B010401 Computer B02 Rpm5400 B010501 Modem B0101 Rpm7200 B010502 Optical_drive B0102 CRT B010601 CPU B0103 LCD B010602 Memory B0104 Laser B010701 Hard Drive B0105 Inkjet B010702 Monitor B0106 Dotmatrix B010703 Printer B0107 Thermal B010704 Cable Modem B010101 Pentium B01030101 Internal 56K B010102 Celeron B01030102 DVD B010201 Sdram B01040101 DVD RW B010202 DDRam B01040102 Issn 2250-3005(online) August| 2012 Page 1141
  • 6. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 Now depending upon the length of codes assigned to each checked for the similarity. We take example of term Intel in term search interface can be divided into clusters so that all search interface o1 and the same term Intel in search interface terms having codes of same length reside in the same cluster. O2. Then we try to combine the children of both terms. This The methodology used for clustering has been defined earlier. term has been selected for the purpose of clarity and The clusters becomes as follows for each search interface. simplicity only so that it doesn’t become too complex. The algorithm suggested above makes use of recursive calls so that all children of the terms under consideration are also B01 B B02 A01, A02, A A03, A04, Level 0 Level 1 A05, A06, A07 Level 0 Level 1 B010101,B0101 B0101,B0102, 02,B010201,B0 B0103,B0104, 10202,B010203, B0105,B0106, B010204,B0102 B0107 A0101,A0102 05,B010301,B0 Level 0 A010101,A0102 ,A0103,A050 10302,B010404, 01,A010202,A0 1,A0502,A05 B010505,B0105 10203,A010204, 03,A0601,A0 02,B010601,B0 A010205,A0102 Level 2 602,A0603,A 3 10602,B010701, Level 06,A010207,A0 0701,A0702, B010702,B0107 10301,A010302, 03,B010704, A0703,A0704 A010303,A0103 04,A010305,A0 B01030101,B01030102 70401 B01040101,B01040102 Level 2 Level 3 Level 4 Fig 1.4Clusters for Search Interface S2 Fig 1.3 Clusters for Search Interface S1 To see how the process of merging of search interfaces will codes A01 and B0103 will be passed to function merge. take place we consider example of term CPU which appears Function merge will make two queues from these two codes in both search interfaces. The process of merging will be one for each code having all its descendants upto any level. followed as per the algorithm suggested above.First of all So that those queues can be passed to function combine and both search interfaces O1 and O2 will be passed to function queues can be merged. In function merge q1 and q2 will be Integrate. The two tables table 1 and table 2 are already created having code A01 and B0103 in q1 and q2 respectively. available with us as given above. While scanning through The color will be assigned to each term in the queue just to these tables as given in the algorithm it will be detected that check whether immediate children of the term have been cpu appearing in table 1 is same as cpu appearing in table 2 added to the queue or not. Initially term will be of white color and whenever two same or similar terms are encountered the and the color will be changed to black when its children has codes of both terms are fetched. From tables the fetched codes been added to the queue. The process will be repeated until all of both terms in different search interfaces are A01 and terms in the queue are black which represents that children of B0103 in search interface 1 and 2 respectively. These two each term has been added to the queue. Issn 2250-3005(online) August| 2012 Page 1142
  • 7. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 Fig 1.5 Taxonomy tree for Search Interface after merging S1 and S2 This way merging process proceeds in the algorithm. The same process will be repeated for any term in the search interface. And after merging two search interfaces result will be stored in form of a queue only. The result obtained after merging the two search interfaces above the resulting search interface becomes as follows. The case study has shown that how the proposed algorithm is implemented for merging search interfaces. It is seen from the Fig 3.6 that all similar terms appear only once in the resulting search interface and none of the concept is left. The same algorithm can be implemented for merging more than two search interfaces. REFERENCES [5] Douglas E. Comer, “The Internet Book”, Prentice Hall [1] Brian Pinkerton, “Finding what people want: of India, New Delhi, 2001. Experiences with the web crawler.” Proceedings Of [6] A. K. Sharma, J. P. Gupta, D. P. Agarwal, “A novel WWW conf., 1994. approach towards efficient Volatile Information [2] Alexandros Ntoulas, Junghoo Cho, Christopher Olston Management”, Proc. Of National Conference on "What's New on the Web? The Evolution of the Web Quantum Computing, 5-6 Oct.’ 2002, Gwalior. from a Search Engine Perspective." In Proceedings of [7] AltaVista, “AltaVista Search Engine,” WWW, the World-Wide Web Conference (WWW), May 2004. http://www.altavista.com. [3] Vladislav Shkapenyuk and Torsten Suel, “Design and [8] Shaw Green, Leon Hurst , Brenda Nangle , Dr. Pádraig Implementation of a High performance Distributed Web Cunningham, Fergal Somers, Dr. Richard Evans, Crawler”, Technical Report, Department of Computer “Sofware Agents : A Review”, May 1997. and Information Science, Polytechnic University, [9] Cho, J., Garcia-Molina, H., Page, L., “Efficient Brooklyn, July 2001. Crawling Through URL Ordering,” Computer Science [4] Junghoo Cho, Narayanan Shivakumar, Hector Garcia- Department, Stanford University, Stanford, CA, USA, Molina "Finding replicated Web collections." In 1997. Proceedings of 2000 ACM International Conference on [10] O. Kaljuvee, O. Buyukkokten, H. Garcia-Molina, and A. Management of Data (SIGMOD), May 2000. Paepcke. “Efficient web form entry on pdas”. Proc. of the 10th Intl. WWW Conf., May 2001. Issn 2250-3005(online) August| 2012 Page 1143
  • 8. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 4 [11] Ching-yao wang, ying-chieh lei, pei-chi chang and [32] Natalya Fridman Noy and Mark A. Musen, “PROMPT: shian- shyong tsen National chiao-tung university, “A Algorithm and tool for automated ontology merging and level wise clustering algorithm on structured alignment” documents” [33] Dinesh Sharma, Komal Kumar Bhatia, A.K Sharma, [12] Benjamin Chin Ming Fung, “Hierarchical document “Crawling the Hidden web resources”, NCIT-07 , Delhi clustering using frequent itemsets” [34] Dinesh Sharma, “Crawling the hidden web resources [13] Francis Crimmins, “Web Crawler Review”. using agent” ETCC-07. NIT- Hamirpur [14] A.K.Sharma, J.P.Gupta, D.P.Agarwal, “An Alternative Scheme for Generating Fingerprints of Static Documents”, accepted for publication in Journal of CSI.. [15] Allan Heydon, Marc Najork, 1999,“ Mercator: “A Scalable, Extensible Web Crawler”, Proceedings of the Eighth International World Wide Web Conference, 219-229, Toronto, Canada, 1999. [16] F.C. Cheong, “Internet Agents: Spiders, Wanderers, Brokers and Bots ”, New Riders Publishing, Indianapolis, Indiana, USA, 1996. [17] Frank M. Shipman III, Haowei Hsieh, J. Michael Moore, Anna Zacchi , ”Supporting Personal Collections across Digital Libraries in Spatial Hypertext” [18] Stephen Davies, Serdar Badem, Michael D. Williams, Roger King, “Google by Reformulation”: Modeling Search as Successive Refinement [19] Gautam Pant, Padmini Srinivasan1, and Filippo Menczer, “Crawling the Web” [20] Monica Peshave ,”How Search engines work and a web crawler application” [21] Ben Shneiderman ,” A Framework for Search Interfaces” [22] Danny C.C. POO Teck Kang TOH, “ Search Interface for Z39.50 Compliant Online Catalogs Over The Internet” [23] David Hawking CSIRO ICT Centre, “Web Search Engines: Part 1” [24] M.D. Lee, G.M. Roberts, F. Sollich, and C.J. Woodru, “Towards Intelligent Search Interfaces” [25] Sriram Raghavan Hector Garcia-Molina, “Crawling the Hidden Web” [26] Jared Cope, “Automated Discovery of Search Interfaces on the Web” [27] IVataru Sunayama' Yukio Osan-a? and Iviasahiko Yachida', “Search Interface for Query Restructuring with Discovering User Interest” [28] Stephen W. Liddle, Sai Ho Yau, and David W. Embley, “On the automatic Extraction of Data from the Hidden Web” [29] Gualglei song, Yu Qian, Ying Liu and Kang Zhang, “Oasis: a Mapping and Integrated framework for Biomedical ontologies” [30] Miyoung cho, Hanil Kim and Pankoo Kim, “ A new method for ontology merging based on the concept using wordnet” [31] Pascal Hitzler, Markus Krotzsch, Marc Ehrig and York Sure, “ What is ontology merging” Issn 2250-3005(online) August| 2012 Page 1144