Towards A Novel Architecture For Wide Area Data Caching And Replication Jaesoo


Published on

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Towards A Novel Architecture For Wide Area Data Caching And Replication Jaesoo

  1. 1. Towards a Novel Architecture for Wide-Area Data Caching and Replication presented by JaeSoo Jang Computer Communication Lab. Soongsil University Tel : 02-816-0689, Cellular : 0505-605-5858 [email_address]
  2. 2. Contents <ul><li>Introduction </li></ul><ul><li>Previous Work </li></ul><ul><li>Approach </li></ul><ul><li>Architecture </li></ul><ul><li>Example </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusion and Future Work </li></ul><ul><li>Critiques </li></ul><ul><li>Question & Answer </li></ul>
  3. 3. Introduction <ul><li>To ensure fast and highly available access to internet services </li></ul><ul><ul><li>Optimally locating data objects and service provision points in the network is critical. </li></ul></ul><ul><ul><li>Currently, caching and replication are widely used. </li></ul></ul><ul><li>However, current caching and replication techniques should be reappraised because of the followings </li></ul><ul><ul><li>Rapid growth of the internet </li></ul></ul><ul><ul><li>Increasing variety of clients demanding internet services </li></ul></ul><ul><ul><li>Phenomenon of hot-spots </li></ul></ul><ul><ul><li>Transient increases in user access patterns </li></ul></ul>
  4. 4. Previous Work (Con’t) <ul><li>Caching </li></ul><ul><ul><li>Client-pull </li></ul></ul><ul><ul><ul><li>Browser-level caching </li></ul></ul></ul><ul><ul><ul><li>per-site proxy servers </li></ul></ul></ul><ul><ul><ul><li>caching hierarchies </li></ul></ul></ul><ul><ul><ul><ul><li>increase the client latency for popular documents because the requests have to percolate through a large number of levels. </li></ul></ul></ul></ul><ul><ul><ul><li>Some papers [WWW95, SDNE95] document the limitation of client-pull approaches. </li></ul></ul></ul><ul><ul><li>Server-push (Data Dissemination) </li></ul></ul><ul><ul><ul><li>This has been shown in paper [SPDP95], the authors are not aware of any such systems in wide spread use. </li></ul></ul></ul>
  5. 5. Previous Work (Con’t) <ul><li>Cache hierarchy </li></ul><ul><ul><li>Hierarchical Web cache relation architecture </li></ul></ul><ul><ul><li>Each cache form peering relationship with neighboring cache </li></ul></ul><ul><ul><li>Kind of peering relationship </li></ul></ul><ul><ul><ul><li>parent-child relationship, sibling relationship </li></ul></ul></ul><ul><li>Neighboring use ICP(Internet Cache Protocol) for cooperation and exchange message </li></ul>
  6. 6. Previous Work (Con’t) <ul><li>Replication among servers at a single location </li></ul><ul><ul><li>DNS-based request distribution </li></ul></ul><ul><ul><li>Sites are forced to configure to handle the peak demand. </li></ul></ul><ul><ul><ul><li>Much resources are wasted during periods of average demand. </li></ul></ul></ul><ul><li>Mirroring </li></ul><ul><ul><li>require manual effort to setup and maintain the consistency of data. </li></ul></ul><ul><ul><li>The user has to select the mirror site to access. </li></ul></ul><ul><li>Rent-A-Server (WebOS project [berkeley98]) </li></ul><ul><ul><li>dynamically spawn server clones and replicate the server data in response to changes in server load. </li></ul></ul>
  7. 7. Approach <ul><li>Motivation </li></ul><ul><ul><li>To minimize access time and conserve bandwidth, the dynamically replicated copies of the server must be located close to hot-spots in client access patterns. </li></ul></ul><ul><ul><li>But, currents approaches do not consider the client access patterns in determining the locations to spawn the server replicas. </li></ul></ul><ul><li>We presents a dynamic caching and replication architecture that considers the temporal and geographical spikes in user demand. </li></ul><ul><ul><li>Distributed computation of client access patterns </li></ul></ul><ul><ul><li>Using the access statistics for replication/migration </li></ul></ul>
  8. 8. Approach (Con’t) <ul><li>Approach </li></ul><ul><ul><li>Let’s use the network nodes </li></ul></ul><ul><ul><ul><li>To compute the access patterns in a distributed fashion and sent to the server periodically </li></ul></ul></ul><ul><ul><ul><ul><li>The network nodes can actually see the flow of client requests. </li></ul></ul></ul></ul><ul><ul><li>Let’s endow objects with intelligence </li></ul></ul><ul><ul><ul><li>To make its own migration/replication decisions based on the access statistics </li></ul></ul></ul>
  9. 9. Architecture <ul><li>Design Goals </li></ul><ul><ul><li>Design of a caching/replication system that obtains the user access patterns from the network to locate the data objects </li></ul></ul><ul><ul><ul><li>The network utilization and access latency are optimized. </li></ul></ul></ul><ul><ul><li>Making caching transparent </li></ul></ul><ul><ul><ul><li>Clients do not have to choose proxies. </li></ul></ul></ul><ul><ul><ul><li>Caches do not have to be configured manually in a hierarchy. </li></ul></ul></ul>
  10. 10. Architecture (Con’t) <ul><li>Using Active Networks </li></ul><ul><ul><li>In case wide-area caching and replication, network nodes are in an ideal position to determine the location of hot-spots in the access patterns of objects retrieved over the network. </li></ul></ul>
  11. 11. Architecture (Con’t) <ul><li>Caching/Replication Model </li></ul>
  12. 12. Architecture (Con’t) <ul><li>Active Data Object (ADO) </li></ul><ul><ul><li>Set of related files that may be transferred as a group </li></ul></ul><ul><ul><li>contain intelligence to make its own migration/replication decisions based on the access statistics obtained from the network. </li></ul></ul><ul><li>Active Node </li></ul><ul><ul><li>maintain state information about the accesses for various data objects. </li></ul></ul><ul><ul><li>have intelligence to periodically update server with access statistics for individual objects . </li></ul></ul>
  13. 13. Architecture (Con’t) <ul><li>Process of caching/replication model </li></ul><ul><ul><li>1. The arrival of an update triggers the ADOs migration routine. </li></ul></ul><ul><ul><li>2. ADOs migration routine analyzes the traffic information, and decides whether to migrate to the hot-spot region. </li></ul></ul><ul><ul><li>3. The server transfer transfers the ADO to the corresponding cache in the hot-spot region. </li></ul></ul><ul><ul><li>4. The caching server announces the presence of the ADO to its associated active node. </li></ul></ul><ul><ul><li>5. Subsequent request for the object is routed to closer server by the associated active node. </li></ul></ul>
  14. 14. Architecture (Con’t) <ul><li>Argument on additional overhead for active networking </li></ul><ul><ul><li>Do the additional overhead for active networking outweigh the benefits for performance gains made by better replication?  No. </li></ul></ul><ul><ul><li>Requiring active processing only for requests for data objects and updates of access statistics keep the network performance penalty small. </li></ul></ul><ul><ul><li>The actual transfer of the data object content between client and server is performed by application-specific, non-active protocols (eq, HTTP, FTP) </li></ul></ul>
  15. 15. Example <ul><li>“ ADO based Web Caching and Replication” </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Using Java applets </li></ul></ul><ul><ul><ul><li>For identifying hot-spots in the client access patterns </li></ul></ul></ul><ul><ul><ul><li>For making the migration/replication decisions </li></ul></ul></ul><ul><ul><li>ANTS(Active Node Transfer System) </li></ul></ul><ul><ul><ul><li>Capsule based active networking toolkit written in Java </li></ul></ul></ul><ul><ul><ul><li>Capsules carry data and references to the code to be executed at active nodes </li></ul></ul></ul>
  16. 16. Example (Con’t) <ul><li>ANTS Capsule Types </li></ul><ul><ul><li>Request Capsule </li></ul></ul><ul><ul><ul><li>sent from clients to the server during connection initiations. </li></ul></ul></ul><ul><ul><ul><li>queries the active nodes it passed through. </li></ul></ul></ul><ul><ul><li>Response Capsule </li></ul></ul><ul><ul><ul><li>sent from the client from the active node when a hit occurs. </li></ul></ul></ul><ul><ul><ul><li>conveys the IP address of the ADO server holding the requested object. </li></ul></ul></ul><ul><ul><li>Information Capsule </li></ul></ul><ul><ul><ul><li>sent from the active node to the ADO server </li></ul></ul></ul><ul><ul><ul><ul><li>when the demand for a data object held by ADO server exceeds the threshold of a server specified popularity modulus. </li></ul></ul></ul></ul><ul><ul><li>Register Capsule </li></ul></ul><ul><ul><ul><li>sent from ADO servers to active nodes </li></ul></ul></ul><ul><ul><ul><ul><li>to register an object held in the cache. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>to refresh object entry periodically. </li></ul></ul></ul></ul>
  17. 17. Example (Con’t) <ul><li>Capsule Processing in an Active Node </li></ul>
  18. 18. Example (Con’t) <ul><li>Capsule Processing (when a object is requested) </li></ul><ul><ul><li>Common processing </li></ul></ul><ul><ul><ul><li>1. When the user input a URL, the client resolves the server name. </li></ul></ul></ul><ul><ul><ul><li>2. The client send the IP address and the URL to a local active network daemon. </li></ul></ul></ul><ul><ul><ul><li>3. A local active network daemon creates an request capsule and forward it towards the home server. </li></ul></ul></ul><ul><ul><ul><li>4. When a request capsule arrives at a node, its forwarding routine queries the activity cache for the requested URL. </li></ul></ul></ul><ul><ul><li>If a match is not found, </li></ul></ul><ul><ul><ul><li>5. The request capsule sets up an entry for the requested URL in the activity cache and increments the access count. </li></ul></ul></ul><ul><ul><ul><li>6. The request capsule forwarded towards the home server. </li></ul></ul></ul><ul><ul><li>If a match is found, </li></ul></ul><ul><ul><ul><li>5. A response capsule is sent back to the client with the IP address of that server holding the requested URL. </li></ul></ul></ul><ul><ul><ul><li>6. The client proceeds to transfer the data. </li></ul></ul></ul>
  19. 19. Example (Con’t) <ul><li>Capsule Processing (which is performed periodically) </li></ul><ul><ul><li>1. The active node sends information capsules to the associated ADO servers with the access statistics. </li></ul></ul><ul><ul><li>2. The arrival of the information capsule triggers the ADO control routine. </li></ul></ul><ul><ul><ul><li>analyze the access statistics. </li></ul></ul></ul><ul><ul><ul><li>make a decision to migrate to a region of high demand. </li></ul></ul></ul><ul><ul><li>3. The server transfers the ADO to the caching server. </li></ul></ul><ul><ul><li>4. The server sends an register capsule to the associated active node to create an entry for the URL in the active node’s activity cache. </li></ul></ul>
  20. 20. Experiments <ul><li>Measurement </li></ul><ul><ul><li>Client latency </li></ul></ul><ul><ul><li>Overheads associated with the active networking </li></ul></ul><ul><li>Instrument of Experiments </li></ul><ul><ul><li>Modified version of the Webpolygraph </li></ul></ul><ul><ul><ul><li>polyclt </li></ul></ul></ul><ul><ul><ul><ul><li>measure the client latency and the overheads. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>request a single object repeatedly at rate of 1 request/second. </li></ul></ul></ul></ul><ul><ul><ul><li>polysrv </li></ul></ul></ul><ul><ul><ul><ul><li>response with a document selected from an exponential distribution with a mean size of 13KB. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>simulate a wide area link by delaying the response by a normal distribution with mean 10 seconds and standard deviation 3 second. </li></ul></ul></ul></ul>
  21. 21. Experiments (Con’t) <ul><li>Network Topology </li></ul><ul><li>Experiments </li></ul><ul><ul><li>Caching threshold = 10 </li></ul></ul><ul><ul><li>Number of clients = 1, 2, 5 </li></ul></ul>
  22. 22. Experiments (Con’t) <ul><li>Result </li></ul>
  23. 23. Conclusion and Future Work <ul><li>Conclusion </li></ul><ul><ul><li>propose an Active Networks based architecture for improving caching/replication that operates by obtaining statistics from the network node to identify hot-spots in client access patterns. </li></ul></ul><ul><li>Future Work </li></ul><ul><ul><li>challenge the replication of dynamically generated content. </li></ul></ul>
  24. 24. Critiques <ul><li>Good Points </li></ul><ul><ul><li>Idea of using network nodes to identify hot-spots in client access patterns </li></ul></ul><ul><li>Weak Points </li></ul><ul><ul><li>This architecture cannot achieve good performance than a classical caching do. </li></ul></ul><ul><ul><li>Previous referenced work is too old. </li></ul></ul><ul><ul><li>Don’t consider cache consistency. </li></ul></ul><ul><ul><li>Don’t describe why caching threshold is a number of 10 </li></ul></ul><ul><li>Alternatives </li></ul><ul><ul><li>Content Delivery Network (CDN) </li></ul></ul>
  25. 25. <ul><li>Q&A </li></ul>Question & Answer