Towards A Novel Architecture For Wide Area Data Caching And Replication Jaesoo

Uploaded on


More in: Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Towards a Novel Architecture for Wide-Area Data Caching and Replication presented by JaeSoo Jang Computer Communication Lab. Soongsil University Tel : 02-816-0689, Cellular : 0505-605-5858 [email_address]
  • 2. Contents
    • Introduction
    • Previous Work
    • Approach
    • Architecture
    • Example
    • Experiments
    • Conclusion and Future Work
    • Critiques
    • Question & Answer
  • 3. Introduction
    • To ensure fast and highly available access to internet services
      • Optimally locating data objects and service provision points in the network is critical.
      • Currently, caching and replication are widely used.
    • However, current caching and replication techniques should be reappraised because of the followings
      • Rapid growth of the internet
      • Increasing variety of clients demanding internet services
      • Phenomenon of hot-spots
      • Transient increases in user access patterns
  • 4. Previous Work (Con’t)
    • Caching
      • Client-pull
        • Browser-level caching
        • per-site proxy servers
        • caching hierarchies
          • increase the client latency for popular documents because the requests have to percolate through a large number of levels.
        • Some papers [WWW95, SDNE95] document the limitation of client-pull approaches.
      • Server-push (Data Dissemination)
        • This has been shown in paper [SPDP95], the authors are not aware of any such systems in wide spread use.
  • 5. Previous Work (Con’t)
    • Cache hierarchy
      • Hierarchical Web cache relation architecture
      • Each cache form peering relationship with neighboring cache
      • Kind of peering relationship
        • parent-child relationship, sibling relationship
    • Neighboring use ICP(Internet Cache Protocol) for cooperation and exchange message
  • 6. Previous Work (Con’t)
    • Replication among servers at a single location
      • DNS-based request distribution
      • Sites are forced to configure to handle the peak demand.
        • Much resources are wasted during periods of average demand.
    • Mirroring
      • require manual effort to setup and maintain the consistency of data.
      • The user has to select the mirror site to access.
    • Rent-A-Server (WebOS project [berkeley98])
      • dynamically spawn server clones and replicate the server data in response to changes in server load.
  • 7. Approach
    • Motivation
      • To minimize access time and conserve bandwidth, the dynamically replicated copies of the server must be located close to hot-spots in client access patterns.
      • But, currents approaches do not consider the client access patterns in determining the locations to spawn the server replicas.
    • We presents a dynamic caching and replication architecture that considers the temporal and geographical spikes in user demand.
      • Distributed computation of client access patterns
      • Using the access statistics for replication/migration
  • 8. Approach (Con’t)
    • Approach
      • Let’s use the network nodes
        • To compute the access patterns in a distributed fashion and sent to the server periodically
          • The network nodes can actually see the flow of client requests.
      • Let’s endow objects with intelligence
        • To make its own migration/replication decisions based on the access statistics
  • 9. Architecture
    • Design Goals
      • Design of a caching/replication system that obtains the user access patterns from the network to locate the data objects
        • The network utilization and access latency are optimized.
      • Making caching transparent
        • Clients do not have to choose proxies.
        • Caches do not have to be configured manually in a hierarchy.
  • 10. Architecture (Con’t)
    • Using Active Networks
      • In case wide-area caching and replication, network nodes are in an ideal position to determine the location of hot-spots in the access patterns of objects retrieved over the network.
  • 11. Architecture (Con’t)
    • Caching/Replication Model
  • 12. Architecture (Con’t)
    • Active Data Object (ADO)
      • Set of related files that may be transferred as a group
      • contain intelligence to make its own migration/replication decisions based on the access statistics obtained from the network.
    • Active Node
      • maintain state information about the accesses for various data objects.
      • have intelligence to periodically update server with access statistics for individual objects .
  • 13. Architecture (Con’t)
    • Process of caching/replication model
      • 1. The arrival of an update triggers the ADOs migration routine.
      • 2. ADOs migration routine analyzes the traffic information, and decides whether to migrate to the hot-spot region.
      • 3. The server transfer transfers the ADO to the corresponding cache in the hot-spot region.
      • 4. The caching server announces the presence of the ADO to its associated active node.
      • 5. Subsequent request for the object is routed to closer server by the associated active node.
  • 14. Architecture (Con’t)
    • Argument on additional overhead for active networking
      • Do the additional overhead for active networking outweigh the benefits for performance gains made by better replication?  No.
      • Requiring active processing only for requests for data objects and updates of access statistics keep the network performance penalty small.
      • The actual transfer of the data object content between client and server is performed by application-specific, non-active protocols (eq, HTTP, FTP)
  • 15. Example
    • “ ADO based Web Caching and Replication”
    • Implementation
      • Using Java applets
        • For identifying hot-spots in the client access patterns
        • For making the migration/replication decisions
      • ANTS(Active Node Transfer System)
        • Capsule based active networking toolkit written in Java
        • Capsules carry data and references to the code to be executed at active nodes
  • 16. Example (Con’t)
    • ANTS Capsule Types
      • Request Capsule
        • sent from clients to the server during connection initiations.
        • queries the active nodes it passed through.
      • Response Capsule
        • sent from the client from the active node when a hit occurs.
        • conveys the IP address of the ADO server holding the requested object.
      • Information Capsule
        • sent from the active node to the ADO server
          • when the demand for a data object held by ADO server exceeds the threshold of a server specified popularity modulus.
      • Register Capsule
        • sent from ADO servers to active nodes
          • to register an object held in the cache.
          • to refresh object entry periodically.
  • 17. Example (Con’t)
    • Capsule Processing in an Active Node
  • 18. Example (Con’t)
    • Capsule Processing (when a object is requested)
      • Common processing
        • 1. When the user input a URL, the client resolves the server name.
        • 2. The client send the IP address and the URL to a local active network daemon.
        • 3. A local active network daemon creates an request capsule and forward it towards the home server.
        • 4. When a request capsule arrives at a node, its forwarding routine queries the activity cache for the requested URL.
      • If a match is not found,
        • 5. The request capsule sets up an entry for the requested URL in the activity cache and increments the access count.
        • 6. The request capsule forwarded towards the home server.
      • If a match is found,
        • 5. A response capsule is sent back to the client with the IP address of that server holding the requested URL.
        • 6. The client proceeds to transfer the data.
  • 19. Example (Con’t)
    • Capsule Processing (which is performed periodically)
      • 1. The active node sends information capsules to the associated ADO servers with the access statistics.
      • 2. The arrival of the information capsule triggers the ADO control routine.
        • analyze the access statistics.
        • make a decision to migrate to a region of high demand.
      • 3. The server transfers the ADO to the caching server.
      • 4. The server sends an register capsule to the associated active node to create an entry for the URL in the active node’s activity cache.
  • 20. Experiments
    • Measurement
      • Client latency
      • Overheads associated with the active networking
    • Instrument of Experiments
      • Modified version of the Webpolygraph
        • polyclt
          • measure the client latency and the overheads.
          • request a single object repeatedly at rate of 1 request/second.
        • polysrv
          • response with a document selected from an exponential distribution with a mean size of 13KB.
          • simulate a wide area link by delaying the response by a normal distribution with mean 10 seconds and standard deviation 3 second.
  • 21. Experiments (Con’t)
    • Network Topology
    • Experiments
      • Caching threshold = 10
      • Number of clients = 1, 2, 5
  • 22. Experiments (Con’t)
    • Result
  • 23. Conclusion and Future Work
    • Conclusion
      • propose an Active Networks based architecture for improving caching/replication that operates by obtaining statistics from the network node to identify hot-spots in client access patterns.
    • Future Work
      • challenge the replication of dynamically generated content.
  • 24. Critiques
    • Good Points
      • Idea of using network nodes to identify hot-spots in client access patterns
    • Weak Points
      • This architecture cannot achieve good performance than a classical caching do.
      • Previous referenced work is too old.
      • Don’t consider cache consistency.
      • Don’t describe why caching threshold is a number of 10
    • Alternatives
      • Content Delivery Network (CDN)
  • 25.
    • Q&A
    Question & Answer