Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 198 NETWORK TRAFFIC OPTIMIZATION FOR PERFORMANCE IMPROVEMENT IN THE WEB SERVICE INFRASTRUCTURES BY CATEGORIZATION OF THE WEB CONTENTS WITH SIZE REDUCTION APPROACH Dr. Suryakant B Patil1 , Ms. Sonal S Deshmukh2 , Ms. Anuja D Bharate3 , Dr. Preeti Patil4 1 Professor, JSPM’s Imperial College of Engineering & Research, Wagholi, Pune 2, 3 PG Research Scholar, JSPM’s ICOER, Wagholi, Pune 4 Dean (SA), HOD & Professor, KIT’s COE, Kolhapur ABSTRACT Nowadays, the network traffic is tremendously increased. This has affected on the additional requirements of the bandwidth, require more time to access the data from server which means increased in response time. In this paper, we are proposing the mechanism of Latency Time, using the algorithm MD5, with the help of hash key and using the formula to improve the efficiency or performance of the system. The problem in proxy server cache memory is Content aliasing, means, the same content occurs multiple times. With the help of proxy servers, users can directly fetch the data, rather to go towards web server. So, the workload of web server will be reduced. The main goal is to find the Response time with different downlink rate, according to the institute schedule which we surveyed, minimize the bandwidth utilization and improve the performance of the proxy server which can be measured by removal of content aliasing. For this experimentation, we surveyed JSPM’s Wagholi campus, having 5 institutes in it. Categories and Subject Descriptors C.2.3 [Network Operation]: Network Management C.4 [Performance of System]: Design Studies GENERAL TERMS: Performance, Reliability, Experimentation, Algorithms. Keywords: Content aliasing, MD5, Latency Time, Proxy Server. INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET) ISSN 0976 - 6480 (Print) ISSN 0976 - 6499 (Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME: www.iaeme.com/ijaret.asp Journal Impact Factor (2014): 7.8273 (Calculated by GISI) www.jifactor.com IJARET © I A E M E
  2. 2. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 199 1. INTRODUCTION In the field of web server management, researchers have focused on aliasing in proxy server caches for a long time [6]. Web caching consists of storing frequently referred objects on a caching server instead of the original server, so that web servers can make better use of network bandwidth, reduce the workload on servers, and improve the response time for users [12, 14]. Aliasing means giving multiple names to the same thing. Aliasing in proxy server caches occurs when the same content is stored in cache multiple times [5]. A proxy server acts as a mediator between the original server and the clients. On the World Wide Web, aliasing commonly occurs when a client makes two requests, and both the requests have the same payload. Currently, browsers perform cache lookups using Uniform Resource Locators (URLs) as identifiers. Aliasing causes repetitive data transfers even when the current request has already been cached under a different URL [8, 11]. Websites that contain the same content are called mirrors. Mirrors are redundancy mechanisms built into the web space to serve web pages faster, but they cost in terms of cache space [9, 13]. As the amount of web traffic increases, the efficient utilization of network bandwidth increasingly becomes more important. The Technique needs to analyze web traffic and understand its characteristics to be able to optimize the use of network bandwidth, to reduce network latency, and to improve response time for users [1]. Nowadays, usage of World Wide Web has been tremendously increased. In any organization/institute, internet is using by most of the students, faculties, administrative department for different purposes [4, 7]. The web pages, which are accessed by the all users, that get stored somewhere, we call that storage “Cache proxy memory”, which is available at client side. It may happen that, some of the users requests for same URL’s, so the pages get stored multiple times in cache memory. So duplication occurs & it grabs more space, which causes to the poor performance. So the eviction of content is very necessary. Therefore, we are using various methods. After the removal of content aliasing, results are better performance, less space. Still there may be lots of other content in cache, which occupies more space. So, it is mandatory to filter that content from cache by using various replacement policies such as FIFO, LRU, LFU, etc. FIFO: It is first in first out policy, manage by queue. When cache get full then it is mandatory to avail some space. So, this scheme removes the pages, which get inserted first. LRU: It is Least Recently Used policy, which filter out the web pages that will not be used in near future. LFU: It is Least Frequently Used policy, which evicts the web pages which are not frequently used. From our survey, it is observed that the static data is used more than dynamic data. So, to keep the static content in the cache is beneficial, as dynamic gets updated frequently. For same request of users, proxy doesn’t have to access the web server and web server’s workload also gets reduced. So it results to improve the latency time of cache. II. LITERATURE SURVEY Shivkumar and Garcia-Molina investigated mirroring in a large crawler data set and reported that in the WebTV client trace far more aliasing happens than expected. In fact, they reported that 36% of reply bodies are accessible through more than one URL [7]. Similarly, surveyed techniques for identifying mirrors on the Internet [3].Investigated mirroring in a large crawler data set and
  3. 3. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 200 reported that roughly 10% of popular hosts are mirrored to some extent [3].Considered approximate mirroring or “syntactic similarity” [3]. Although they introduce sophisticated measures of document similarity, they report that most “clusters” of similar documents in a large crawler data set contain only identical documents. Duplication has both positive and negative aspects. On one hand the redundancy makes retrieval easier: if a search engine has missed one copy, maybe it has the other; or if one page has become unavailable, maybe a replica can be retrieved. On the other hand, from the point of view of search engines storing duplicate content is a waste of resources and from the user’s point of view, getting duplicate answers in response to a query is a nuisance. The principal reason for duplication on the Web is the systematic replication of content across distinct hosts, a phenomenon known as “mirroring” (These notions are defined more precisely below.) It is estimated that at least 10% of the hosts on the WWW are mirrored. Each document on the WWW has a unique name called the Universal Resource Locator (URL). The URL consists of three disjoint parts, namely the access method, a hostname, and a path. III. EXPERIMENTATION AND RESULTS Besides the obvious goals of Web caching system, a Web caching system having a number of properties. They are fast access, robustness, transparency, scalability, efficiency, adaptivity, stability, load balanced, ability to deal with heterogeneity, and simplicity. Discuss below. • Fast access: From user’s point of view, access latency is an important measurement of the quality of Web service. A desirable caching system should aim at reducing Web access latency. In particular, it should provide user a lower latency on average than those without employing a caching system. Robustness. From user’s prospect, the robustness means availability, which is another important measurement of quality of Web service. User’s desire to have Web service available whenever they want. The robustness has three aspects. First, it’s desirable that a few proxies crash wouldn’t tear the entire system down. The caching system should eliminate the single point failure as much as possible. Second, the caching system should fall back gracefully in case of failures. Third, the caching system would be design in such a way that it’s easy to recover from a failure. • Transparency: A Web caching system should be transparent for the user, the only results user should notice are faster response and higher availability. • Scalability: We have seen an explosive growth in network size and density in last decades and is facing a more rapid increasing growth in near future. The key to success in such an environment is the scalability. We would like a caching scheme to scale well along the increasing size and density of network. This requires all protocols employed in the caching system to be a slight weight as possible. • Efficiency: There are two aspects to efficiency. First, how much over head does the Web caching system impose on network? We would like a caching system to impose a minimal additional burden on the network. This includes both control packets and extra data packets incurred by using a caching system. Second, the caching system shouldn’t adopt any scheme which leads to underutilization of critical resources in network.
  4. 4. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 201 • Load balancing: It’s desirable that the caching scheme distributes the load evenly through the entire network. A single proxy/server shouldn’t be a bottleneck (or hotspot) and there by degrades the performance of a portion of the network or even slow down the entire service system. • Ability to deal with heterogeneity: As networks grow in scale and coverage, they span arrange of hardware and software architectures. The Web caching scheme need adapt to arrange of network architectures. Fig. 1: Content Classification for different Content categories. • Simplicity: Simplicity is always an asset. Simpler schemes are easier to implement and likely to be accepted as international standards. We would like an ideal Web caching mechanism to be simple to deploy. • Adaptivity: It’s desirable to make the caching system adapt to the dynamic changing of the user demand and the network environment. The adaptivity involves several aspects: cache management, cache routing, proxy placement, etc. This is essential to achieve optimal performance. • Stability: The schemes used in Web caching system shouldn’t introduce in stabilities into the network. Campus includes total 5 institutes, we observed all that.Figure1 shows, the classification of all the static data for whole campus. Figure1 directly shows that, in which category the user is mostly interested. As we surveyed at our Institute Campus, most of the students request for the same pages, so duplication may occur. So, the cache require more space, which leads to poor performance. The blue line(upper line) in figure shows the content which has duplication(with CA) and red line(lower line) shows the content without duplication(without CA). 0 1000 2000 3000 4000 5000 6000 GIF PNG JPG HTML Size(KB) Content Categories JSPM's Wagholi Campus Classification Content Categories With CA Without CA
  5. 5. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 202 Fig. 2: Total data classification for Campus with and without Content Aliasing Figure2 shows that, the classification of static and dynamic data for whole campus. The joined bar shows the duplicated and not-duplicated data. So, just observing the figure, data get reduced after duplication. Fig. 3: Size Reduction in different institutes at JSPM’s Wagholi Campus Figure3 shows that, data for different institutes at JSPM’s Wagholi Campus.HTTP defines several headers which were specifically designed to support caching. Though the HTTP specification specifies certain behaviors for web caches, it does not specify how to keep cached objects up to date. From our survey, at JSPM’s Wagholi Campus, we come up with some measures of the various categories and sizes. 0 5000 10000 15000 20000 25000 STATIC DYNAMIC TOTAL Size(KB) Category JSPM's Wagholi Campus Classification Static Dynamic With CA Without CA 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ICOER BSIOTR CHARAK ENIAC KAUTILYA TOTAL Size(KB) Various Institutes Size Reductions in all Institutes of JSPM's Wagholi Before CA After CA
  6. 6. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 203 The HTTP GET message is used to retrieve a web object given its URL. However GET alone does not guarantee that it will return a fresh object. HTTP headers that may effect caching can be classified into two categories. The first category includes headers appended to retrieve a web object for cache control. The second category includes headers appended when a web object is returned. IV. CONCLUSION In an Environment where saving bandwidth on the shared external network is of utmost importance, the proxy cache should use a replacement policy that achieves high byte hit rates. A proxy cache could also utilize multiple replacement policies. This work can be further optimize by the Daemon Process, which can be design and run periodically to check the consistency of the data cached and the data at the web server. This can be scheduled during the slack time with the less traffic which will not add any additional toll on the bandwidth. When caching is implemented, frequently accessed content is stored close to the users, eliminating this duplicated effort. A request from a user’s browser is first sent to the network’s caching server. If the requested contentfound in the web cache and the information is fresh, the content is sent directly back to the requester, skipping an upstream journey to the target website. Which we have shown here with our experimentations at JSPM’s Wagholi Campus with end result is marginable reduction in the size followed by bandwidth. REFERENCES [1] KartikBommepally, Glisa T. K., Jeena J. Prakash, SanasamRanbir Singh and Hema A Murthy “Internet Activity Analysis through Proxy Log” IEEE, 2010. [2] Jun Wu; Ravindran, K., "Optimization algorithms for proxy server placement in content distribution networks," Integrated Network Management-Workshops, 2009. [3] Ngamsuriyaroj, S. ; Rattidham, P. ; Rassameeroj, I. ; Wongbuchasin, P. ; Aramkul, N. ; Rungmano, S. “Performance Evaluation of Load Balanced Web Proxies” IEEE, 2011. [4] Chen, W.; Martin, P.; Hassanein, H.S., "Caching dynamic content on the Web," Canadian Conference on Electrical and Computer Engineering, 2003, vol.2, no., pp. 947- 950 vol.2, 4-7 May 2003. [5] Sadhna Ahuja, Tao Wu and Sudhir Dixit “On the Effects of Content Compression on Web Cache Performance,” Proceedings of the International Conference on Information Technology: Computers and Communications, 2003. [6] A. Mahanti, C. Williamson, and D. Eager, “Traffic Analysis of a Web Proxy Caching Hierarchy,” IEEE Network Magazine, May 2000. [7] N. Shivakumar and H. Garcia-Molina, “Finding near Replicas of Documents on the Web” Proc. Workshop on Web Databases, Mar. 1998. [8] Jeffrey C. mogul “A trace-based analysis of duplicate suppression in HTTP,” Compaq Computer Corporation Western Research Laboratory, Nov. 1999. [9] S B Patil, SachinChavan, PreetiPatil; “High Quality Design and Methodology Aspects to Enhance Large Scale Web Services”, International Journal of Advances in Engineering & Technology (IJAET-2012), ISSN: 2231-1963, March 2012, Volume3, Issue1, Pages175-185. (Journal Impact Factor: 1.96) [10] Prof. S B Patil, Sachin Chavan, Dr. Preeti Patil and Prof. Sunita R Patil, “High Quality Design to Enhance and Improve Performance of Large Scale Web Applications”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 198 - 205, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. (Journal Impact Factor: 1.0425)
  7. 7. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 4, April (2014), pp. 198-204 © IAEME 204 [11] S B Patil, D. B. Kulkarni; “Improving web performance through Hierarchical caching & content aliasing”, The 7th International Conference on “Information Integration and Web- based Applications & Services”, 19-21 September 2005, Kuala Lumpur, Malaysia. [12] Srikantha Rao, PreetiPatil, S B Patil, SunitaPatil“Customized Approach for Efficient Data Storing and Retrieving from University Database Using Repetitive Frequency Indexing”, IEEE INTERNATIONAL CONFERENCE PUBLICATIONS, RAIT 2012, ISM Dhanbad, Jahrkhand, March 15–17, 2012 (Aavailable on IEEE Xplore) Print ISBN: 978-1-4577-0694- 3, Digital Object Identifier: 10.1109/RAIT.2012.6194612 Page(s): 511 – 514 [13] Srikantha Rao, PreetiPatil, S B Patil;“Enhanced Software Development Strategy implying High Quality Design for Large Scale Database Projects”, International Conference and Workshop on Emerging Trends in Technology ICWET 2012, ISBN: 978-0-615-58717-2, TCET Mumbai, February 22–25, 2012, Pages: 508-513 [14] Srikantha Rao, PreetiPatil, S B Patil;“Object-Oriented Software Engineering Paradigm: A Seamless Interface in Software Development Life Cycle”, ACM_Asia_Pacific International Conference on Advances in Computing (ICAC-2008), Anuradha Engineering College, Chikhali, Feb 2008. [15] S.Saira Thabasum, “Need for Design Patterns and Frameworks for Quality Software Development”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 54 - 58, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [16] Dr.K.Prasadh and R.Senthilkumar, “Nonhomogeneous Network Traffic Control System Using Queueing Theory”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 394 - 405, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [17] Sachin Chavan and Nitin Chavan, “Improving Access Latency of Web Browser by using Content Aliasing in Proxy Cache Server”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 356 - 365, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.