Improving access latency of web browser by using content aliasing in

404 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Improving access latency of web browser by using content aliasing in

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME356IMPROVING ACCESS LATENCY OF WEB BROWSER BY USINGCONTENT ALIASING IN PROXY CACHE SERVERSachin Chavan1, Nitin Chavan21Department of Computer Engineering, MPSTME, NMIMS, Shirpur2Department of Information Technology, MPSTME, NMIMS, ShirpurABSTRACTThe web community is growing so quickly that the number of clients accessing webservers is increasing nearly tremendously. This rapid increase of web clients affected severalaspects and characteristics of web such as reduced network bandwidth, increased latency, andhigher response time for users who require large scale web services. This paper considersdifferent types of proxy actions and proposes a novel design and methodology to addressthese issues. Focused on studies in what way they influence the browser display time. Itdiscusses also acceptable loading times and the scope of cacheable objects. The methodologyworks by analysing content in the proxy cache, identifying content aliasing, duplicatesuppression and by the creation of the respective soft links. The present solution makesintelligent use of the proxy cache server to overcome these problems. In this study proxieswere designed to enable network administrators to control internet access from withinintranet. But when proxy cache is used, there develops the problem of Aliasing. Aliasing inproxy server caches occurs when the same content is stored in the cache several times. Thepresent methodology improves performance in case of access latency and browser responsetime at the same time it avoids storing the same content in cache multiple times those resultsin wastage of storage space.KEYWORDS: Access Latency, Cache, Web Proxy, Mirroring, and Duplicate Suppression,Content aliasing.INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING& TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 2, March – April (2013), pp. 356-365© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2013): 6.1302 (Calculated by GISI)www.jifactor.comIJCET© I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME3571. INTRODUCTIONIn the field of web server management, researchers have focused on aliasing in proxyserver caches for a long time. Web caching consists of storing frequently referred objects on acaching server instead of the original server, so that web servers can make better use ofnetwork bandwidth, reduce the workload on servers, and improve the response time for users.Aliasing means giving multiple names to the same thing.The proxy cache also stores all of the images and sub files for the visited pages, so ifthe user jumps to a new page within the same site that uses, for example, the same images,the proxy cache has them already stored and can load them into the users browser quickerthan having to retrieve them from the Web site servers remote site. Aliasing in proxy servercaches occurs when the same content is stored in cache multiple times. On the World WideWeb, aliasing commonly occurs when a client makes two requests, and both the requestshave the same payload. Currently, browsers perform cache lookups using Uniform ResourceLocators (URLs) as identifiers.Websites that contain the same content are called mirrors. Mirrors are redundancymechanisms built into the web space to serve web pages faster, but they cost in terms ofcache space. As the amount of web traffic increases, the efficient utilization of networkbandwidth increasingly becomes more important. The Technique needs to analyse web trafficto understand its characteristics. That will optimize the use of network bandwidth to reducenetwork latency and to improve response time for users [8].A proxy cache is a shared network device that can undertake Web transactions onbehalf of a client, and, like the browser, the proxy cache stores the content. Subsequentrequests for this content, by this or any other client of the cache will trigger the cache todeliver the locally stored copy of the content, avoiding a repeat of the download from theoriginal content source [4].Figure 1. Concept of Caching (Proxy Cache)Bandwidth Saving and Traffic ReductionProxy Cache Server
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME3581.1 Advantages of Caching1. Web caching reduces the workload of the remote Web server2. Client can obtain a cached copy at the proxy if the remote server is not available.3. It provides us a chance to analyze an organization usage patterns.1.2 Disadvantages of using a caching:1. A client might be looking at stale data due to the lack of proper proxy updating.2. The access latency may increase in the case of a cache miss due to the extra proxyprocessing.3. A single proxy cache is always a bottleneck.4. A single proxy is a single point of failure.2. RELATED WORK2.1 The Access LatencyLatency is defined as the delay between a request for a Web page and receiving thatpage in its entirety. The latency problem occurs when users judge the download as too long.Unacceptable latency does not only adversely effects user satisfaction. Web pages that areloaded faster are judged to be significantly more interesting than their slower counterparts[12].Studies on human cognition revealed that the response time shorter than 0.1 second isunnoticeable and the delay of 1 second matches the pace of interactive dialog. Followingtable shows the transfer rate of different connection types.Table 1. Transfer Rates for different connection TypeConnection Type Slow Normal MaximumModem 33k6 <2.734 ≈3 ≈3.65Modem 56k <4.199 ≈5 ≈6.08ISDN 64k <5.469 ≈6 ≈6.94Cable <9.766 ≈17 by providerADSL <12.21 ≈24 ≈732Ethernet 10Base-T (10 Megabits/sec) <73 ≈195 ≈977Table shows the different parameters that affects the access time of browser. The differentparameters are type of connection used by the user and the condition of connection. Thetiming of internet use also affects on access latency due to bandwidth sharing.2.2 Web TrafficThe amount of data sent and received by visitors to a website is web traffic. It isanalysis to see the popularity of web sites and individual pages or sections within a site. Webtraffic can be analyzed by viewing the traffic statistics found in the web server log file, anautomatically generated list of all the pages served.Traffic analysis is conducted using access logs from web proxy server. Each entry inaccess logs records the URL of document being requested, date and time of the request, thename of the client host making the request, number of bytes returns to requesting client, andinformation that describe how the clients request was treated as proxy [1].Processing these log entries can produce useful summary statistics about workload volume,document type and sizes, popularity of document and proxy cache performance [5].
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME3592.3 Static CachingIt is a new approach of web caching which uses yesterday’s log to predict the today’suser request. The static caching algorithm defines a fixed set of URLs by analyzing the logsof previous periods. It then calculates the value of the unique URL. Depending on the value,URLs are arranged in the descending order, and the URL with the highest value is selected.This set of URLs is known as the working set. When a user requests a document and thedocument is present in the working set, the request is fulfilled from the cache. Otherwise, theuser request is fulfilled from the origin server [6].2.4 Dynamic CachingDynamic caching is more complex than static caching and requires detailedknowledge of the application. One must consider the candidates for dynamic cachingcarefully since, by its very nature, dynamically generated content can be different based onthe state of the application. Therefore, it is important to consider under what conditionsdynamically generated content can be cached returning the correct response. This requiresknowledge of the application, its possible states, and other data, such as parameters thatensure the dynamic data is generated in a deterministic manner [3].2.5 MD5 AlgorithmMD5, developed by Ron Rives in 1992, is a comparison cryptographic hash algorithmthat succeeded the MD4 algorithm. MD5 takes an input of any length and generates an MD5digest of fixed length (128 bits or 32 characters). Because MD5 uses the same algorithmevery time, a particular data string always generates the same MD5 hash every time.MD5 cryptographic hash offers several advantages over its predecessors (such as MD4) andits competitors (such as, SHA and SHA.1). One of these advantages is that MD5 is a one waycryptographic hash. Another advantage is that MD5 can accept inputs of any length but stillgenerates a fixed length output. MD5 is fast, and it is highly unlikely that two differentstrings can hash to the same digest. Moreover, with MD5 it is also highly unlikely that twodifferent input strings can hash to the same digest. Furthermore, MD5 is reliable in the sensethat the same input string always yields the same output digest every time [11].3. EXPERIMENTAL SETUP3.1 Changing of proxy serverIn most of the organization’s or institution server does not support the proxy cache, soit is difficult to use main server as cache server so we have to change the proxy server frommain server to other server [2].Following are the steps to switch machine to other proxy:1. Open the browser for ex. Internet Explorer2. In internet explorer pull down the Tools menu and click Internet Options...3. Click the Connections tab:4. click the LAN Settings... button:5. In the Address: box change "proxy1 Address" to "proxy2 Address" or vice versa andclick OK.6. Click OK on the Internet Options dialogue box to get back to the browser screen andyou will now be able to get external sites.
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME3603.2 Duplication of DataDuplication of data means storing the multiple copies of same data object. In case ofcache when we cache the object or the webpage that web page is stored at cache memory butwhen the different users request the same page then the multiple copies of that object or webpage is stored at cache memory which results in the wastage of storage space as we all knowthe maintenance of cache is an expensive task so such wastage is not affordable. To avoid theproblem of duplication of the data objects or web page duplicate suppression mechanism is tobe used [7]. If the duplicate copy of data is saved at proxy cache then it acquires more spaceof storage in the analysis part given in work shows that the effect of duplication in the cachespace [4].3.3 Duplicate SuppressionYou can reduce storage space requirements by avoiding duplicating copies of thesame data. Content Engine provides the option to suppress storage of duplicate contentelements. Duplicate suppression applies to any kind of content. Incoming content is notadded to the storage area if identical content exists in the storage area; only unique content isadded [14].Due to large network size there are many pages on web, most of those pages will not bereferenced multiple times by any one cache, means the probability with which the Kthpagewill be referenced is 1/K. re-referenced follow a distribution similar to Zipf’s law [9].3.5 Experimental ResultsThe experimentation carried out at the lab of our institute. Some of popular websitesare considered for experiment. Those websites are use to analyse for access latency ofbrowser under different conditions. Keyword based search also used for Latency time basedon the type of content either image or text search.Table 2. Response time of search engine for Text and Image Search.Text Search Image SearchKeywordsFromWeb ServerFromCache ServerFromWeb ServerFromCache ServerSVKM 250 140 230 200NMIMS 140 130 300 100RCPIT 250 120 350 150CANNON 240 130 250 100SAMSUNG 210 140 640 200NOKIA 250 190 240 120MATLAB 240 160 280 160OPERA 250 150 120 120SIEMENS 230 160 310 100MICROMAX 160 140 190 110MPSC 170 140 180 100UPSC 210 150 150 140IRCTC 160 140 330 90RRB 260 120 310 70
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME361Table 2 show the reduction into response time of browser when page is fetched fromcache server instead of web server. First column shows the different keywords which is usedfor analysis. Same keywords are used for the text search and image search. Table containsresponse time of browser for text search as well as image search. From table we can say thatthere is considerable amount of reduction of access latency when the page is fetch fromProxy Cache.Figure 2 shows the comparison of response time when the page is fetch from mainserver and the response time when it is fetch from proxy cache. From Figure 2 we can saythat there is considerable amount of reduction of the response time. Figure shows the graphplot for comparison of response time when the response comes from main source and whenthe response comes from local cache server for Text Search for some keywords. Here first barshows the response time when the page is fetch from Web server where second bar shows theresponse time when the page is fetch from local proxy cache server where we haveimplemented content aliasing algorithm.Figure.2 Response time of Search engine for Text SearchFrom Figure 2 it is clear that in text search for keyword we get 40 or more than 40percent of reduction of response time. Where in case of some keywords like Samsung,IRCTC, RRB, Siemens the response time is reduced by more than 70 percent. Wherein caseof opera, SVKM, and UPSC it is negligible or at most 10 percent. It is due to dynamiccontent comes under the search.Figure 3 shows the comparison of response time for image search for given keywordswhen the page is fetch from main server and the response time when it is fetch from proxycache. From Figure 3 we can say that there is difference between the response times. Figureshows the graph plot for comparison of response time when the response comes from mainsource and when the response comes from local cache server for Image Search for somekeywords. Here first bar shows the response time when the page is fetch from Web serverwhere second bar shows the response time when the page is fetch from local proxy cacheserver where we have implemented content aliasing algorithm.
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME362Figure 3. Response time of Search engine for Image SearchFrom Figure 3 it is clear that in Image search for keyword we get very less amount ofreduction in the response time because the images are more dynamic than the text.Table 3. Connection Time and Response time of browser for some Websites.From Web Server From Cache ServerWEBSITE Connection Response Connection Responsewww.nmims.edu 7000 44000 3000 14000www.rcpit.ac.in 6120 26140 3920 10310www.mpsc.gov.in 5800 25700 3200 6390www.upsc.gov.in 1890 4760 320 690www.unipune.ac.in 2480 8600 1130 1580www.wipro.com 2300 24750 900 3780www.infosys.com 1710 18180 770 1980www.techmahindra.com 990 18000 1260 7250www.jaihindcollege.com 1210 13230 500 1170www.jaihindcollege.ac.in 1800 15930 540 1040www.msbte.com 1800 10170 810 1130www.msbshse.ac.in 1530 4550 540 1040www.cbse.nic.in 1130 5580 630 900www.irctc.com 1710 12960 1670 3240Table 3 shows the connection time and response time of browser for a various sites. Itgives the comparison of connection time and response time when page is fetched from cacheserver instead of web server. First column shows the different websites which is used foranalysis. From table we can say that there is considerable amount of reduction of accesslatency when the page is fetch from Proxy Cache instead of main server.
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME363Figure.4 Connection time for different WebsitesFigure 4 shows the effect of content aliasing on the access time of web browser interms of connection time. In maximum cases we get more than 50 percent of reduction inconnection time. In some cases the reduction is 30-50 percent. In case of IRCTC website thereduction in connection time is negligible. Where in case of ‘TECHMAHINDRA’ websiteconnection time increased. It is due to the dynamic content is more on website.Figure. 5 Response time for Different WebsitesFigure 5 shows the comparative graph of response time of browser for differentwebsites. When the web page is fetched from cache server then the response time is less.From above graph we can say that the reduction in response time is more than 60 percent ineach case. In some cases the reduction into the response time is more than 90 percent. So byusing the content aliasing in proxy cache server we get significant amount of time save interms of response time as well as connection time.
  9. 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME364It is clear that amount of user time is saved by using the concept of content aliasing.We have achieved reduction of access latency by also considering other parameters likecache size, stale data.4. CONCLUSIONThe analysis based experimental results proves the need for methodology thatimprove the web access performance to enhance bandwidth utilization and greaterconnectivity speed. Here the suggested Design aspects improve the web performance in termsof reduced latency, improved user response time, and optimal use of the existing bandwidthby using web caching. Content aliasing successfully detected using a web based application,database queries and files system calls. A considerable amount of duplicate storage can beavoided through the suggested methodology. It is, therefore, a very useful mechanism forweb proxy caches. Moreover, the solution is successfully able to keep cached pages insynchronization with the pages on the web server, checking for new pages if needed. Thiswork can be further optimize by the Daemon Process, which can be design and runperiodically to check the consistency of the data cached and the data at the web server. Thiscan be scheduled during the slack time with the less traffic which will not add any additionaltoll on the bandwidth as well as it updates the TTL – Time to Live Period of the cached data.REFERENCES[1] Kartik Bommepally, Glisa T. K., Jeena J. Prakash, Sanasam Ranbir Singh and Hema AMurthy “Internet Activity Analysis through Proxy Log” IEEE, 2010.[2] E-Services Team, “Changing Proxy Server” by the Robert Gordon University, Schoolhill, Aberdeen, Scotland-2006.[3] Chen, W.; Martin, P.; Hassanein, H.S., "Caching dynamic content on the Web,"Canadian Conference on Electrical and Computer Engineering, 2003, vol.2, no., pp.947- 950 vol.2, 4-7 May 2003.[4] Sadhna Ahuja, Tao Wu and Sudhir Dixit “On the Effects of Content Compression onWeb Cache Performance,” Proceedings of the International Conference on InformationTechnology: Computers and Communications, 2003.[5] Mark S. Squillante, David D. Yaot and Li Zhang “Web Traffic Modeling and WebServer Performance Analysis” Proceedings of the 38 Conference on Decision &Control Phoenix, Arizona USA December 1999.[6] C. E. Wills and M. Mikhailov, “Studying the Impact of More Complete ServerInformation on Web Caching,” Computer Communications, vol. 24, no. 2, pp. 184.190,May 2000.[7] J Wang “A Survey of Web Caching Schemes for the Internet” - Cornell NetworkResearch Group (C/NRG), Department of Computer Science, Cornell University 1999.[8] N. Shivakumar and H. Garcia-Molina, “Finding near Replicas of Documents on theWeb” Proc. Workshop on Web Databases, Mar. 1998.[9] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf likeDistributions: Evidence and Implications. In Proc. Infocom ’99. New York, NY, March,1999.
  10. 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME365[10] Guerrero, C.; Juiz, C.; Puigjaner, R.; "Web Performance and Behavior Ontology,"Complex, Intelligent and Software Intensive Systems, 2008. CISIS 2008. InternationalConference on, vol., no., pp.219-225, 4-7 March 2008.[11] Kimmo Jarvinen, Matti Tommiska and Jorma Skytta, “Hardware ImplementationAnalysis of the MD5 Hash Algorithm,” IEEE Computer Society. 2005.[12] Andrzej Sieminski, “The impact of Proxy caches on Browser Latency” InternationalJournal of Computer Science & Applications, 2005, Vol. II, No. II, pp. 5 – 21.[13] S B Patil, Sachin Chavan, Preeti Patil; “High quality design to enhance and improveperformance of large scale web applications” International Journal of ComputerEngineering and Technology (IJCET), Volume 3, Issue 1, January- June (2012),pp. 198-205, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.[14] S.Vikram Phaneendra, “Minimizing Client-Server Traffic Based on AJAX”,International journal of Computer Engineering & Technology (IJCET), Volume 3,Issue 1, 2012, pp. 10 - 16, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.[15] A. Suganthy, G.S.Sumithra, J.Hindusha, A.Gayathri and S.Girija, “Semantic WebServices and its Challenges”, International journal of Computer Engineering &Technology (IJCET), Volume 1, Issue 2, 2010, pp. 26 - 37, ISSN Print: 0976 – 6367,ISSN Online: 0976 – 6375.

×