SlideShare a Scribd company logo
1 of 5
Download to read offline
Short Paper
                                                            ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013



         An Efficient Cloud based Approach for Service
                           Crawling
                  Chandan Banerjee 1, 2, Anirban Kundu 2, 3, Sumon Sadhukhan1, Rana Dattagupta4
                                  1
                                    Netaji Subhash Engineering College, Kolkata 700152, India
                                      {chandanbanerjee1, sumon.sadhukhan8}@gmail.com
                              2
                                Innovation Research Lab (IRL), Howrah, West Bengal 711103, India
                                                        anik76in@gmail.com
                          3
                            Kuang-Chi Institute of Advanced Technology, Shenzhen 518057, P.R.China
                                                   anirban.kundu@kuang-chi.org
                                            4
                                              Jadavpur University, Kolkata 700032, India
                                                    rdattagupta@cse.jdvu.ac.in

Abstract— In this paper, we have designed a crawler that                surfacing. The challenge has been studied by several
searches services provided by different clouds connected in a           researches such as [5], [6], [7], [8], [9]. In these methods,
network. Proposed method provides details of freshness and              candidate query keywords are generated from the obtained
age of cloud clusters. Crawler checks each router available in          records.Section II shows our proposed framework and the
a network providing services. On basis of search criteria, our
                                                                        corresponding approach. Experimental analyses are
design generates output guiding users for accessing requested
cloud services in efficient manner. We have planned to store            presented in Section III. Section IV concludes the paper.
the result in an m-way tree and to use traversal technique for
extraction of specific data from the crawling result. We have                                   II. FRAMEWORK
compared the result with other typical search techniques.
                                                                            We consider that there are several nodes which are
Index Terms—cloud crawler, service crawling, cloud search,              connected to each other in a network fashion. Clusters are
Freshness, Age                                                          formed with several nodes providing distinct services. The
                                                                        head node is also connected with the network. Cluster may
                         I. INTRODUCTION                                have private networks recursively. The crawler will reach the
                                                                        end point and take information from them and send them to
    In modern life, the usage of cloud is growing in a rapid            the head node. The Node A, stores the whole result. Boxes
way. Cloud user typically relies on specific services. Web              are indicating networks. A network may have a sub-network.In
search engines [1] crawl the web and update information                 the second section, we use M-Way tree traversal technique
world-wide. Now-a-days, Internet users are switching from               so that we can reach the destination with minimum path
single service to cloud service requiring more availability of          length. In the last section we show how the technique is
cloud service. Web crawlers [2] store data after fetching web           efficient in comparison with other searching algorithm. To
pages and cache them into their database. Every crawler                 realize the efficiency of the algorithm we need to understand
stores the crawled result in its database and result is searched        about the Freshness and Age of crawler. Every crawler has to
when it is needed. The search Engines [3] are often compared            update fast the database and produce efficient result. The
with other search Engines with time complexity and space                terms freshness and age involve the Database.
complexity. Freshness and Age of crawled result are also
considerably important. Cloud crawler [4] works with Internet           A. Freshness and Age
Protocol (IP) addresses of a cache stored in a tree structure.              A cloud service database is called ‘fresher’ when it has
Hosts are visited using specific threads for specific networks.         updated information with other crawlers. For an instance if a
    Frequently, one needs to maintain local copies of remote            crawler crawls more nodes than other crawlers then it is
data sources for better performance or availability. For                fresher. If a crawler shows a result of 5 min ago then it is its
example, Web search engine copies a significant subset of               age.
the Web and maintain copies or indexes of the pages to help             1. Freshness
users access relevant information.In this situation, a part of              Let S = {n1, n2, n3…nn} is the total amount of node in the
the local copy may get out-of-date because changes at the               network; where n1, n2 are nodes and N is the number of
sources are not immediately propagated to the local copy.               elements. D1, D2, …, Dn are the service stored on the particular
Therefore, it becomes important to design a good refresh                node. Total freshness of the crawler is,
policy that maximizes the “freshness” of the local copy. As                 Freshness (tn) = 1/N i=1N F(ni,t);
the cloud services grow larger, it becomes more important to                Where F(ni,t) = 0 if not updated
refresh the data more effectively.One critical challenge in                                 = 1 if updated at time t
surfacing approach is how a crawler can automatically                   2. Age
generate promising queries so that it can carry out efficient           Let {T1, T2… Tn} is the time set, when the information about

© 2013 ACEEE                                                       61
DOI: 01.IJIT.3.1. 1114
Short Paper
                                                            ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013


 the specific node is taken into account. The current time is T.
Then, the age of the node is {T-Tn}.
At time t, if the age of an element is Ai, then
Ai = 0      (if it is updated at t)
Ai= Ti – Ti-1 (if it is not updated at t)
Total Time of the A(s,t) = 1/N i=1NAi
    A cloud crawler is used to fetch the services for creating
a framework of cloud service crawler engine using proper
indexing methodologies. A crawler for a specific service is a
program for extracting outward Web links (URLs) and further
adding them into a list after processing. Thus, a cloud service
                                                                                    Fig. 2. Arbitrary Cloud Cluster Scenario
crawler is a program which fetches as many relevant services
as possible for the specific users. It uses the Web link                In crawling run time a hash table is made mapping with the
structure in which the order of the list is important, because          Node and Number (IP-address) of resources in a cloud network
only high quality Web pages are considered as relevant. Fig.            which is shown in Table 2. Our proposed search approach
1 shows the proposed service based cloud crawler. Here, an              shows in subsection E.Sample network is being crawled using
element insertion means that the element is inserted at the             proposed method which is shown in Table I.
pointer location within the m-way tree. A special traversal                        TABLE I. PROPOSED APPROACH BASED ON FIG. 2
technique is utilized for visiting all the nodes within each
network or sub-network. Each node is selected twice. Second
time it is actually popped from stack. An advantage of our
algorithm is that data need not to be stored in the client node.
The result is directly sent to the crawler server after scanning
a single node.




         Fig. 1. Flowchart of Service based Cloud Crawler

B. Sample Procedure of a Sample Network
    Fig. 2 shows an arbitrary cloud cluster. There are total
four network clusters within a cloud. Circular boxes indicate
the clusters and rectangular boxes indicate the resources of
each cluster network. Table 1 show the result which is based
on our proposed approach as shown in our previous work [1].
© 2013 ACEEE                                                       62
DOI: 01.IJIT.3.1.1114
Short Paper
                        ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013




                                 C. Hash Table
                                    The hash table is generated based on the mapping
                                 between the Node and Number (IP-address) of resources in a
                                 cloud network. Table II is created using real-time crawling.
                                              TABLE II. H ASH TABLE BASED ON TABLE I




                                 D. Indexing Result
                                     Crawler finishes searching the cloud; and, then stores
                                 the result into an M-Way tree using Table II based on Fig. 3.
                                 E. Search Approch
                                    The algorithm described in Fig. 4 is used to reach any
                                 node using the crawling result. Consider, Node 13 is to be
© 2013 ACEEE                63
DOI: 01.IJIT.3.1.1114
Short Paper
                                                               ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013

                                                                                    TABLE III. PROPOSED SEARCH APPORACH




                      Fig. 3. M-Way tree
visited in a particular time instance. Table 3 shows different
steps to search Node 13.




        Fig. 4. Flow chart to reach any node using Fig. 2
The shortest path to reach Node 13 is {1        9    11     13}.

                 III. EXPERIMENTAL ANALYSIS
   We know, time complexities [10] [11] of DFS and BFS are
O(|V|+|E|); where V= vertices of the graph and E =Edge of
graph;
A. Best Case Scenario
1) Breath First Search (BFS)
Total Number Nodes visited=MN; where M= Average Number
of machine present in every network. N=Level of Tree.
2) Depth First Search (DFS)
Total Number of Node Visited= N, where N=Level of tree.
3) Based on our Proposed Algorithm
Total Number of Node Visited= N, where N=Level of tree.
The best case analysis has been shown in Fig. 5. Our algorithm
has been compared with typical DFS and BFS methods. With
the help of comparative study we conclude that number of
visited node would be increased with the increment of level                  Fig. 5. Best Case Complexity     Comparison
of m-way Tree. With the help of our proposed searching            B. Worst Case Scenario
method, we can find out shortest the path to reach every
                                                                  1)Breath First Search (BFS)
node.
© 2013 ACEEE                                                   64
DOI: 01.IJIT.3.1.1114
Short Paper
                                                           ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013


Total Number of Node Visited = M^(N+1)                                                           CONCLUSIONS
2) Depth First Search (DFS)
                                                                           In our methodology, a Hash-table is generated in which
Total Number of Node Visited = M^(N+1)
                                                                       each resource is assigned with a particular number. The Hash
3) Based on our Proposed Algorithm
                                                                       table is helpful for identification of each node. It is also useful
Total Number of Node Visited = N
                                                                       to find out shortest path for reaching any node (resource)
    Minimum time complexity has been achieved to reach any
                                                                       within the table. Freshness and age of a result can be
destination node using our proposed algorithm in worst case
                                                                       calculated with the help of hash-table comparing the past
analysis. Fig. 6 shows the worst case complexity analysis
                                                                       and present results of the particular nodes. In different network
comparison.
                                                                       different machines have same IP address; it can be identified
                                                                       by hash-table because it allocates unique number to each
                                                                       machine. Minimal numbers of nodes are being visited in
                                                                       proposed method compared to DFS or BFS.

                                                                                                  REFERENCES
                                                                       [1] Brin, S., Page, L., “The anatomy of a large-scale hyper textual
                                                                            Web search engine,” Computer Network ISDN Syst. 30, 1998,
                                                                            pp. 107-117
                                                                       [2] Lu, J., Wang, Y., Liang, J., Chen, J., Liu, J., “An Approach to
                                                                            Deep Web Crawling by Sampling,” Web Intelligence 2008, pp.
                                                                            718-724
                                                                       [3] Yang, Kai-Hsiang, Pan, Chi-Chien, Lee, Tzao-Lin,
                                                                            “Approximate search engine optimization for directory
                                                                            service,” Parallel and Distributed Processing Symposium,
                                                                            2003, Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ.,
                                                                            Taipei, Taiwan
                                                                       [4] C.Banerjee, A.Kundu, S.Sadhukhan, S.Bose, R.Dattagupta ;
                                                                            “Service Crawling in Cloud Computing”; 2nd International
                                                                            Conference on Advances in Information Technology and Mobile
                                                                            Communication, CCIS 296, pp. 243~246, Springer-Verlag
                                                                            Berlin Heidelberg Publication
           Fig. 6. Worst Case Complexity Comparison                    [5] Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A.,
    Four clusters have been used for experimental purpose                   Halevy, A.: Google’s Deep-Web Crawl. In Proceedings of
                                                                            VLDB2008. Auckland, New Zealand, pp. 1241—1252 (2008)
using tree traversal as shown in Fig.7 using cloud crawler
                                                                       [6] Ntoulas, A., Zerfos, P., Cho, J.: Downloading Textual Hidden
based on IP addresses available in cache. Threads have been                 Web Content through Keyword Queries. In Proceedings of
utilized to visit distinct hosts in a concurrent manner. There              JCDL2005. Denver, USA. pp. 100—109 (2005)
is no need to store data into client node as result is directly        [7] Barbosa, L., Freire, J.: Siphoning Hidden-Web Data through
sent to crawler server scanning each node. Cloud crawler                    Keyword-Based Interfaces. In Proceedings of SBBD2004,
works with IP addresses of a cache following an m-way tree                  Brasilia, Brazil, pp. 309—321 (2004)
structure.                                                             [8] Liu, J., Wu, ZH., Jiang, L., Zheng, QH., Liu, X.: Crawling
                                                                            Deep Web Content Through Query Forms. In Proceedings of
                                                                            WEBIST2009, Lisbon Portugal, pp. 634—642 (2009)
                                                                       [9] Lu, J., Wang, Y., Liang, J., Chen, J., Liu J.: An Approach to
                                                                            Deep Web Crawling by Sampling. In Proceedings of IEEE/
                                                                            WIC/ACM Web Intelligence, Sydney, Australia, pp. 718—
                                                                            724 (2008)
                                                                       [10] M. Ajtai, On the complexity of the pigeonhole principle,
                                                                            Proc. of the 29th FOCS, pp. 346–355, 1988
                                                                       [11] Thomas H. Cormen, Cli_ord Stein, Ronald L. Rivest, and
                                                                            Charles E. Leiserson. Introduction to Algorithms. The MIT
                                                                            Press, 3rd edition, 2009




                    Fig. 7. Crawling Results


© 2013 ACEEE                                                      65
DOI: 01.IJIT.3.1.1114

More Related Content

What's hot

Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentSwapnil Shahade
 
A Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud ComputingA Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud ComputingIRJET Journal
 
Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938Editor IJARCET
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computingijujournal
 
A Task Scheduling Algorithm in Cloud Computing
A Task Scheduling Algorithm in Cloud ComputingA Task Scheduling Algorithm in Cloud Computing
A Task Scheduling Algorithm in Cloud Computingpaperpublications3
 
Cloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsCloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsIJEEE
 
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud EnvironmentA Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environmentneirew J
 
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENTA BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENThiij
 
Improved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling AlgorithmImproved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling Algorithmiosrjce
 
Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...ijccsa
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingDIGVIJAY SHINDE
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
 
(5 10) chitra natarajan
(5 10) chitra natarajan(5 10) chitra natarajan
(5 10) chitra natarajanIISRTJournals
 
Ijarcet vol-2-issue-7-2236-2240
Ijarcet vol-2-issue-7-2236-2240Ijarcet vol-2-issue-7-2236-2240
Ijarcet vol-2-issue-7-2236-2240Editor IJARCET
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...ijgca
 
New Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentNew Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentMohammed Adam
 

What's hot (19)

Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
A Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud ComputingA Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud Computing
 
Cloud Computing and PSo
Cloud Computing and PSoCloud Computing and PSo
Cloud Computing and PSo
 
Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 
A Task Scheduling Algorithm in Cloud Computing
A Task Scheduling Algorithm in Cloud ComputingA Task Scheduling Algorithm in Cloud Computing
A Task Scheduling Algorithm in Cloud Computing
 
Cloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithmsCloud computing Review over various scheduling algorithms
Cloud computing Review over various scheduling algorithms
 
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud EnvironmentA Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
 
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENTA BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
 
Improved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling AlgorithmImproved Max-Min Scheduling Algorithm
Improved Max-Min Scheduling Algorithm
 
Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...
 
20 26
20 26 20 26
20 26
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
(5 10) chitra natarajan
(5 10) chitra natarajan(5 10) chitra natarajan
(5 10) chitra natarajan
 
Ijarcet vol-2-issue-7-2236-2240
Ijarcet vol-2-issue-7-2236-2240Ijarcet vol-2-issue-7-2236-2240
Ijarcet vol-2-issue-7-2236-2240
 
E035425030
E035425030E035425030
E035425030
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
 
New Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile AgentNew Framework for Improving Bigdata Analaysis Using Mobile Agent
New Framework for Improving Bigdata Analaysis Using Mobile Agent
 

Viewers also liked

Graphical User Interface for Benthic Mapping
Graphical User Interface for Benthic MappingGraphical User Interface for Benthic Mapping
Graphical User Interface for Benthic MappingIDES Editor
 
Ontology-based Semantic Approach for Learning Object Recommendation
Ontology-based Semantic Approach for Learning Object RecommendationOntology-based Semantic Approach for Learning Object Recommendation
Ontology-based Semantic Approach for Learning Object RecommendationIDES Editor
 
A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...
A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...
A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...IDES Editor
 
Assessing the alignment between students of the department of secretaryship
Assessing the alignment between students of the department of secretaryshipAssessing the alignment between students of the department of secretaryship
Assessing the alignment between students of the department of secretaryshipAlexander Decker
 

Viewers also liked (7)

Graphical User Interface for Benthic Mapping
Graphical User Interface for Benthic MappingGraphical User Interface for Benthic Mapping
Graphical User Interface for Benthic Mapping
 
Ontology-based Semantic Approach for Learning Object Recommendation
Ontology-based Semantic Approach for Learning Object RecommendationOntology-based Semantic Approach for Learning Object Recommendation
Ontology-based Semantic Approach for Learning Object Recommendation
 
A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...
A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...
A Comparative Study on Direct and Pulsed Current Gas Tungsten Arc Welding of ...
 
Que son las normas apa
Que son las normas apaQue son las normas apa
Que son las normas apa
 
Assessing the alignment between students of the department of secretaryship
Assessing the alignment between students of the department of secretaryshipAssessing the alignment between students of the department of secretaryship
Assessing the alignment between students of the department of secretaryship
 
Sales Culture
Sales CultureSales Culture
Sales Culture
 
Klin.Inf. 101028
Klin.Inf. 101028Klin.Inf. 101028
Klin.Inf. 101028
 

Similar to An Efficient Cloud based Approach for Service Crawling

NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...ijccsa
 
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...neirew J
 
Resource allocation for fog computing based on software-defined networks
Resource allocation for fog computing based on  software-defined networksResource allocation for fog computing based on  software-defined networks
Resource allocation for fog computing based on software-defined networksIJECEIAES
 
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Pvrtechnologies Nellore
 
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...redpel dot com
 
Enabling efficient multi keyword ranked
Enabling efficient multi keyword rankedEnabling efficient multi keyword ranked
Enabling efficient multi keyword rankedSakthi Sundaram
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)irjes
 
F233842
F233842F233842
F233842irjes
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...IJECEIAES
 
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...IRJET Journal
 
Service oriented cloud architecture for improved performance of smart grid ap...
Service oriented cloud architecture for improved performance of smart grid ap...Service oriented cloud architecture for improved performance of smart grid ap...
Service oriented cloud architecture for improved performance of smart grid ap...eSAT Journals
 
Service oriented cloud architecture for improved
Service oriented cloud architecture for improvedService oriented cloud architecture for improved
Service oriented cloud architecture for improvedeSAT Publishing House
 
Pre-allocation Strategies of Computational Resources in Cloud Computing using...
Pre-allocation Strategies of Computational Resources in Cloud Computing using...Pre-allocation Strategies of Computational Resources in Cloud Computing using...
Pre-allocation Strategies of Computational Resources in Cloud Computing using...ijccsa
 
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...IJSTA
 
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationTurn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationRichard Jung
 
E FFICIENT D ATA R ETRIEVAL F ROM C LOUD S TORAGE U SING D ATA M ININ...
E FFICIENT  D ATA  R ETRIEVAL  F ROM  C LOUD  S TORAGE  U SING  D ATA  M ININ...E FFICIENT  D ATA  R ETRIEVAL  F ROM  C LOUD  S TORAGE  U SING  D ATA  M ININ...
E FFICIENT D ATA R ETRIEVAL F ROM C LOUD S TORAGE U SING D ATA M ININ...IJCI JOURNAL
 
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...AM Publications
 

Similar to An Efficient Cloud based Approach for Service Crawling (20)

NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
 
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
 
Resource allocation for fog computing based on software-defined networks
Resource allocation for fog computing based on  software-defined networksResource allocation for fog computing based on  software-defined networks
Resource allocation for fog computing based on software-defined networks
 
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
 
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
 
Enabling efficient multi keyword ranked
Enabling efficient multi keyword rankedEnabling efficient multi keyword ranked
Enabling efficient multi keyword ranked
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
F233842
F233842F233842
F233842
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
 
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
 
Service oriented cloud architecture for improved performance of smart grid ap...
Service oriented cloud architecture for improved performance of smart grid ap...Service oriented cloud architecture for improved performance of smart grid ap...
Service oriented cloud architecture for improved performance of smart grid ap...
 
Service oriented cloud architecture for improved
Service oriented cloud architecture for improvedService oriented cloud architecture for improved
Service oriented cloud architecture for improved
 
Pre-allocation Strategies of Computational Resources in Cloud Computing using...
Pre-allocation Strategies of Computational Resources in Cloud Computing using...Pre-allocation Strategies of Computational Resources in Cloud Computing using...
Pre-allocation Strategies of Computational Resources in Cloud Computing using...
 
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
 
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationTurn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
 
E FFICIENT D ATA R ETRIEVAL F ROM C LOUD S TORAGE U SING D ATA M ININ...
E FFICIENT  D ATA  R ETRIEVAL  F ROM  C LOUD  S TORAGE  U SING  D ATA  M ININ...E FFICIENT  D ATA  R ETRIEVAL  F ROM  C LOUD  S TORAGE  U SING  D ATA  M ININ...
E FFICIENT D ATA R ETRIEVAL F ROM C LOUD S TORAGE U SING D ATA M ININ...
 
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
 
D017212027
D017212027D017212027
D017212027
 

More from IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

More from IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

An Efficient Cloud based Approach for Service Crawling

  • 1. Short Paper ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013 An Efficient Cloud based Approach for Service Crawling Chandan Banerjee 1, 2, Anirban Kundu 2, 3, Sumon Sadhukhan1, Rana Dattagupta4 1 Netaji Subhash Engineering College, Kolkata 700152, India {chandanbanerjee1, sumon.sadhukhan8}@gmail.com 2 Innovation Research Lab (IRL), Howrah, West Bengal 711103, India anik76in@gmail.com 3 Kuang-Chi Institute of Advanced Technology, Shenzhen 518057, P.R.China anirban.kundu@kuang-chi.org 4 Jadavpur University, Kolkata 700032, India rdattagupta@cse.jdvu.ac.in Abstract— In this paper, we have designed a crawler that surfacing. The challenge has been studied by several searches services provided by different clouds connected in a researches such as [5], [6], [7], [8], [9]. In these methods, network. Proposed method provides details of freshness and candidate query keywords are generated from the obtained age of cloud clusters. Crawler checks each router available in records.Section II shows our proposed framework and the a network providing services. On basis of search criteria, our corresponding approach. Experimental analyses are design generates output guiding users for accessing requested cloud services in efficient manner. We have planned to store presented in Section III. Section IV concludes the paper. the result in an m-way tree and to use traversal technique for extraction of specific data from the crawling result. We have II. FRAMEWORK compared the result with other typical search techniques. We consider that there are several nodes which are Index Terms—cloud crawler, service crawling, cloud search, connected to each other in a network fashion. Clusters are Freshness, Age formed with several nodes providing distinct services. The head node is also connected with the network. Cluster may I. INTRODUCTION have private networks recursively. The crawler will reach the end point and take information from them and send them to In modern life, the usage of cloud is growing in a rapid the head node. The Node A, stores the whole result. Boxes way. Cloud user typically relies on specific services. Web are indicating networks. A network may have a sub-network.In search engines [1] crawl the web and update information the second section, we use M-Way tree traversal technique world-wide. Now-a-days, Internet users are switching from so that we can reach the destination with minimum path single service to cloud service requiring more availability of length. In the last section we show how the technique is cloud service. Web crawlers [2] store data after fetching web efficient in comparison with other searching algorithm. To pages and cache them into their database. Every crawler realize the efficiency of the algorithm we need to understand stores the crawled result in its database and result is searched about the Freshness and Age of crawler. Every crawler has to when it is needed. The search Engines [3] are often compared update fast the database and produce efficient result. The with other search Engines with time complexity and space terms freshness and age involve the Database. complexity. Freshness and Age of crawled result are also considerably important. Cloud crawler [4] works with Internet A. Freshness and Age Protocol (IP) addresses of a cache stored in a tree structure. A cloud service database is called ‘fresher’ when it has Hosts are visited using specific threads for specific networks. updated information with other crawlers. For an instance if a Frequently, one needs to maintain local copies of remote crawler crawls more nodes than other crawlers then it is data sources for better performance or availability. For fresher. If a crawler shows a result of 5 min ago then it is its example, Web search engine copies a significant subset of age. the Web and maintain copies or indexes of the pages to help 1. Freshness users access relevant information.In this situation, a part of Let S = {n1, n2, n3…nn} is the total amount of node in the the local copy may get out-of-date because changes at the network; where n1, n2 are nodes and N is the number of sources are not immediately propagated to the local copy. elements. D1, D2, …, Dn are the service stored on the particular Therefore, it becomes important to design a good refresh node. Total freshness of the crawler is, policy that maximizes the “freshness” of the local copy. As Freshness (tn) = 1/N i=1N F(ni,t); the cloud services grow larger, it becomes more important to Where F(ni,t) = 0 if not updated refresh the data more effectively.One critical challenge in = 1 if updated at time t surfacing approach is how a crawler can automatically 2. Age generate promising queries so that it can carry out efficient Let {T1, T2… Tn} is the time set, when the information about © 2013 ACEEE 61 DOI: 01.IJIT.3.1. 1114
  • 2. Short Paper ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013 the specific node is taken into account. The current time is T. Then, the age of the node is {T-Tn}. At time t, if the age of an element is Ai, then Ai = 0 (if it is updated at t) Ai= Ti – Ti-1 (if it is not updated at t) Total Time of the A(s,t) = 1/N i=1NAi A cloud crawler is used to fetch the services for creating a framework of cloud service crawler engine using proper indexing methodologies. A crawler for a specific service is a program for extracting outward Web links (URLs) and further adding them into a list after processing. Thus, a cloud service Fig. 2. Arbitrary Cloud Cluster Scenario crawler is a program which fetches as many relevant services as possible for the specific users. It uses the Web link In crawling run time a hash table is made mapping with the structure in which the order of the list is important, because Node and Number (IP-address) of resources in a cloud network only high quality Web pages are considered as relevant. Fig. which is shown in Table 2. Our proposed search approach 1 shows the proposed service based cloud crawler. Here, an shows in subsection E.Sample network is being crawled using element insertion means that the element is inserted at the proposed method which is shown in Table I. pointer location within the m-way tree. A special traversal TABLE I. PROPOSED APPROACH BASED ON FIG. 2 technique is utilized for visiting all the nodes within each network or sub-network. Each node is selected twice. Second time it is actually popped from stack. An advantage of our algorithm is that data need not to be stored in the client node. The result is directly sent to the crawler server after scanning a single node. Fig. 1. Flowchart of Service based Cloud Crawler B. Sample Procedure of a Sample Network Fig. 2 shows an arbitrary cloud cluster. There are total four network clusters within a cloud. Circular boxes indicate the clusters and rectangular boxes indicate the resources of each cluster network. Table 1 show the result which is based on our proposed approach as shown in our previous work [1]. © 2013 ACEEE 62 DOI: 01.IJIT.3.1.1114
  • 3. Short Paper ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013 C. Hash Table The hash table is generated based on the mapping between the Node and Number (IP-address) of resources in a cloud network. Table II is created using real-time crawling. TABLE II. H ASH TABLE BASED ON TABLE I D. Indexing Result Crawler finishes searching the cloud; and, then stores the result into an M-Way tree using Table II based on Fig. 3. E. Search Approch The algorithm described in Fig. 4 is used to reach any node using the crawling result. Consider, Node 13 is to be © 2013 ACEEE 63 DOI: 01.IJIT.3.1.1114
  • 4. Short Paper ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013 TABLE III. PROPOSED SEARCH APPORACH Fig. 3. M-Way tree visited in a particular time instance. Table 3 shows different steps to search Node 13. Fig. 4. Flow chart to reach any node using Fig. 2 The shortest path to reach Node 13 is {1 9 11 13}. III. EXPERIMENTAL ANALYSIS We know, time complexities [10] [11] of DFS and BFS are O(|V|+|E|); where V= vertices of the graph and E =Edge of graph; A. Best Case Scenario 1) Breath First Search (BFS) Total Number Nodes visited=MN; where M= Average Number of machine present in every network. N=Level of Tree. 2) Depth First Search (DFS) Total Number of Node Visited= N, where N=Level of tree. 3) Based on our Proposed Algorithm Total Number of Node Visited= N, where N=Level of tree. The best case analysis has been shown in Fig. 5. Our algorithm has been compared with typical DFS and BFS methods. With the help of comparative study we conclude that number of visited node would be increased with the increment of level Fig. 5. Best Case Complexity Comparison of m-way Tree. With the help of our proposed searching B. Worst Case Scenario method, we can find out shortest the path to reach every 1)Breath First Search (BFS) node. © 2013 ACEEE 64 DOI: 01.IJIT.3.1.1114
  • 5. Short Paper ACEEE Int. J. on Information Technology, Vol. 3, No. 1, March 2013 Total Number of Node Visited = M^(N+1) CONCLUSIONS 2) Depth First Search (DFS) In our methodology, a Hash-table is generated in which Total Number of Node Visited = M^(N+1) each resource is assigned with a particular number. The Hash 3) Based on our Proposed Algorithm table is helpful for identification of each node. It is also useful Total Number of Node Visited = N to find out shortest path for reaching any node (resource) Minimum time complexity has been achieved to reach any within the table. Freshness and age of a result can be destination node using our proposed algorithm in worst case calculated with the help of hash-table comparing the past analysis. Fig. 6 shows the worst case complexity analysis and present results of the particular nodes. In different network comparison. different machines have same IP address; it can be identified by hash-table because it allocates unique number to each machine. Minimal numbers of nodes are being visited in proposed method compared to DFS or BFS. REFERENCES [1] Brin, S., Page, L., “The anatomy of a large-scale hyper textual Web search engine,” Computer Network ISDN Syst. 30, 1998, pp. 107-117 [2] Lu, J., Wang, Y., Liang, J., Chen, J., Liu, J., “An Approach to Deep Web Crawling by Sampling,” Web Intelligence 2008, pp. 718-724 [3] Yang, Kai-Hsiang, Pan, Chi-Chien, Lee, Tzao-Lin, “Approximate search engine optimization for directory service,” Parallel and Distributed Processing Symposium, 2003, Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan [4] C.Banerjee, A.Kundu, S.Sadhukhan, S.Bose, R.Dattagupta ; “Service Crawling in Cloud Computing”; 2nd International Conference on Advances in Information Technology and Mobile Communication, CCIS 296, pp. 243~246, Springer-Verlag Berlin Heidelberg Publication Fig. 6. Worst Case Complexity Comparison [5] Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Four clusters have been used for experimental purpose Halevy, A.: Google’s Deep-Web Crawl. In Proceedings of VLDB2008. Auckland, New Zealand, pp. 1241—1252 (2008) using tree traversal as shown in Fig.7 using cloud crawler [6] Ntoulas, A., Zerfos, P., Cho, J.: Downloading Textual Hidden based on IP addresses available in cache. Threads have been Web Content through Keyword Queries. In Proceedings of utilized to visit distinct hosts in a concurrent manner. There JCDL2005. Denver, USA. pp. 100—109 (2005) is no need to store data into client node as result is directly [7] Barbosa, L., Freire, J.: Siphoning Hidden-Web Data through sent to crawler server scanning each node. Cloud crawler Keyword-Based Interfaces. In Proceedings of SBBD2004, works with IP addresses of a cache following an m-way tree Brasilia, Brazil, pp. 309—321 (2004) structure. [8] Liu, J., Wu, ZH., Jiang, L., Zheng, QH., Liu, X.: Crawling Deep Web Content Through Query Forms. In Proceedings of WEBIST2009, Lisbon Portugal, pp. 634—642 (2009) [9] Lu, J., Wang, Y., Liang, J., Chen, J., Liu J.: An Approach to Deep Web Crawling by Sampling. In Proceedings of IEEE/ WIC/ACM Web Intelligence, Sydney, Australia, pp. 718— 724 (2008) [10] M. Ajtai, On the complexity of the pigeonhole principle, Proc. of the 29th FOCS, pp. 346–355, 1988 [11] Thomas H. Cormen, Cli_ord Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. The MIT Press, 3rd edition, 2009 Fig. 7. Crawling Results © 2013 ACEEE 65 DOI: 01.IJIT.3.1.1114