• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
L07
 

L07

on

  • 291 views

 

Statistics

Views

Total Views
291
Views on SlideShare
291
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • In this context – I am choosing as a challenge problem – I am looking at providing multisource multicast functionality. This is a building block useful for a large set of higher-level applications: conferencing, white-board type applications, resource monitoring and discovery, data distribution. And it is relatively easy to imagine how this building block can be used in the context of the large scale application scientific effort I’ve mentioned before. (for data distribution, for file location, resource discovery) The space I am looking at: multiple senders and receivers, medium scale -- thousands, tens of thousands of nodes. Due, in part, to limited deployment of multicast at the IP-level, the commonly used solution to provide multicast functionality is by building an application layer overlay. An overlay will use point-to-point tunnels provided by the IP layer to connect end-hosts. And adds message routing functionality at end hosts to couple these tunnels and implement multicast functionality. The challenge here is … ----------------------------------------------------------------------------- J: he calls this a coordination primitive ? M: what makes the Grid setting special, how do you address this in your solution.
  • In this context – I am choosing as a challenge problem – I am looking at providing multisource multicast functionality. This is a building block useful for a large set of higher-level applications: conferencing, white-board type applications, resource monitoring and discovery, data distribution. And it is relatively easy to imagine how this building block can be used in the context of the large scale application scientific effort I’ve mentioned before. (for data distribution, for file location, resource discovery) The space I am looking at: multiple senders and receivers, medium scale -- thousands, tens of thousands of nodes. Due, in part, to limited deployment of multicast at the IP-level, the commonly used solution to provide multicast functionality is by building an application layer overlay. An overlay will use point-to-point tunnels provided by the IP layer to connect end-hosts. And adds message routing functionality at end hosts to couple these tunnels and implement multicast functionality. The challenge here is … ----------------------------------------------------------------------------- J: he calls this a coordination primitive ? M: what makes the Grid setting special, how do you address this in your solution.
  • So, how do I evaluate success of these overlays: One first set of metrics evaluates the overheads compared to an ‘ideal’ solution which is IP-layer multicast. It is important to note here that since these metrics compare with the an idealized performance linked to the physical topology

L07 L07 Presentation Transcript

  • Lecture 7
    • Virtualization
    • More on communication protocols
  • [Last time] How to handle incoming requests? (iteratively vs. concurrently)
    • Main Choices:
      • Iterative vs. concurrent
      • Processes vs. threads.
      • Blocking vs. non-blocking IO
    No parallelism, Blocking system calls (for I/O) Single-threaded process Parallelism, Nonblocking system calls (event driven programming) Finite-state machine Parallelism, Blocking system calls Threads Characteristics Model
  • [Last time] Other issues in Server Design
    • How do clients find server’s location?
    • Stateless vs. stateful server design
    • Server clusters
    • Virtualization and distributed servers
  • Virtualization
    • Virtualization is becoming increasingly important:
      • Hardware changes faster than software
        • Need support for portability and code migration
      • Isolation of failing or attacked components
      • [Distributed] Application hosting (Amazon’s EC2)
  • Abstraction levels
    • Observation : Virtualization can take place at very different levels, strongly depending on the interfaces offered by various systems components:
  • Architecture of Virtual Machines (VM)
    • Note: Make a distinction between (a) process virtual machines and (b) virtual machine monitors
    • Process VM: A program is compiled to intermediate (portable) code, and then executed by a runtime system (Example: Java VM).
    • VMM: A separate software layer mimics the instruction set of hardware) a complete operating system and its applications can be supported (Example: VMware on processors that do not support hardware virtualization).
  • Uses
    • Desktop side
      • Application testing
      • Software development / software evaliation
      • Security:
        • Control the amount of damage a user/app can do
      • Run OS even when I do not have the drivers
    • Server/infrastructure side
      • Server consolidation / migration / load balancing
        • Caveats
      • Honeypots
  • Distributed server example: PlanetLab
    • Setup : Different organizations contribute machines, which they subsequently share for various experiments.
    • Problem : Ensure that different distributed applications do not get into each other’s way
    • Solution : virtualization :
    • Vserver:
    • Independent and protected environment with its own libraries, server versions, etc.
    • Distributed applications are assigned a collection of vservers distributed across multiple machines ( slice ).
  • PlanetLab: Principals (Stakeholders)
    • Node Owners
      • host one or more nodes (retain ultimate control)
      • selects an MA and approves of one or more SAs
    • Service Providers (Developers)
      • implements and deploys network services
      • responsible for the service’s behavior
    • Management Authority (MA)
      • installs an maintains software on nodes
      • creates VMs and monitors their behavior
      • registers service providers
    • Slice Authority (SA)
      • creates slices and binds them to responsible provider
  • Trust Relationships
    • (A) Owner trusts MA to do responsible management (e.g., map network activity to responsible slice)
    Management Authority Owner Service Provider Slice Authority (F) [Node owner delegates resource management to SA] trusts SA to map slice to responsible providers E B (D) Service provider trusts SA to create VMs on its behalf [Contacts the SA to create VMs] (C) [Service Provider registers itself with the MA] Provider trusts MA to provide working VMs & not falsely accuse it (E) SA trusts provider to deploy responsible services [Authenticates the sevice provider] (B) MA trusts owner to keep nodes physically secure [MA provides the software to run nodes] (G) MA delegates ability to create slices & trusts SA to map slice to responsible providers A F D C G
  • Node Boot/Install Process Node PLC Boot Server 1. Boots from BootCD (Linux loaded) 2. Hardware initialized 3. Read network config . from floppy 7. Node key read into memory from USB memory 4. Contact PLC (MA) 6. Execute boot mgr Boot Manager 8. Invoke Boot API 10. State = “install”, run installer 11. Update node state via Boot API 13. Chain-boot node (no restart) 14. Node booted 9. Verify node key, send current node state 12. Verify node key, change state to “boot” 5. Send boot manager
  • User view
  • Plan for today
    • Wide-area distributed platforms
      • Virtualization
      • Example: PlanetLab
    • Communication
    • Distributed system definition:
    • collection of independent components that appears to its users as a single coherent system
    • Components need to communicate
      • Shared memory
      • Message exchange
    • Data distribution:
      • Multicast
      • Epidemic algorithms
    • Two categories of solutions:
      • Based on support from the network: IP-multicast
      • Without network support: application-layer multicast
    Multicast Communication IP Multicast Overlay Calgary Chicago MIT1 UBC MIT2 end systems routers IP multicast flow Chicago UBC Calgary MIT1 MIT2 end systems overlay tunnels
  • Discussion
    • Deployment if IP-multicast is limited. Why?
    • What should be the success metrics?
    Application Layer Multicast IP Multicast Overlay Calgary Chicago MIT1 UBC MIT2 end systems routers IP multicast flow Chicago UBC Calgary MIT1 MIT2 end systems overlay tunnels
    • Overheads compared to IP multicast
      • Relative Delay Penalty (RDP): Overlay-delay vs. IP-delay
      • Stress : number of duplicate packets on each physical link
    MIT2 Chicago MIT1 UBC Calg2 Calg1 IP Multicast MIT2 Chicago MIT1 Calg1 Calg2 UBC Overlay Application-level multicast success metrics: Relative Delay Penalty and Link Stress Link stress distribution Relative delay penalty distribution 90%-tile RDP Maximum link stress
  • Today …
    • Socket programming:
      • Message based, synchronous, non-persistent
    • Client-server infrastructures
      • RPC, RMI
      • Message based, synchronous or not, generally non-persistent
    • Message oriented middleware
      • Asynchronous, persistent
    • Data distribution:
      • Multicast
      • Epidemic algorithms
  • Epidemic algorithms: Principle
    • Basic idea : Assume there are no write–write conflicts:
      • Update operations are initially performed at one node
      • A node passes its updated state to a limited number of neighbors; neighbors, in-turn, pass the update to their neighbors
      • Update propagation is lazy, i.e., not immediate
      • Eventually, each update should reach every node
    • Anti-entropy : Each node regularly chooses another node at random, and exchanges state differences, leading to identical states at both afterwards
    • [Variation] Gossiping : A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).
    • The advantage : reliability, fast dissemination
  • Amazon S3 incident on Sunday, July 20 th , 2008
    • 8:40am PDT: error rates began to quickly climb
    • 10 min: error rates significantly elevated and very few requests complete successfully
    • 15 min: Multiple engineers investigating the issue. Alarms pointed at problems within the systems and across multiple data centers.
      • Trying to restore system health by reducing system load in several stages. No impact.
    • Amazon S3:
    • Provides a simple web services interface to store and retrieve any amount of data.
    • Intends to be highly scalable, reliable, fast, and inexpensive data storage infrastructure…
    • S3 serves a large number of customers. Amazon itself uses S3 to run its own global network of web sites.
    • 4billion objects stored Q4’06  40billion Q4’08
  • Amazon S3 incident on Sunday, July 20 th , 2008
    • 1h01min: engineers detect that servers within Amazon S3 have problems communicating with each other
      • Amazon S3 uses a gossip protocol to spread servers’ state info in order to quickly route around failed or unreachable servers
      • After, engineers determine that a large number of servers were spending almost all of their time gossiping
    • 1h52min: unable to determine and solve the problem, they decide to shut down all components, clear the system's state, and then reactivate the request processing components.
    • Restart the system!
  • Amazon S3 incident on Sunday, July 20 th , 2008
    • 2h29min: the system's state cleared
    • 5h49min: internal communication restored and began reactivating request processing components in the US and EU.
    • 7h37min: EU was ok and US location began to process requests successfully.
    • 8h33min: Request rates and error rates had returned to normal in US.
  • Post-event investigation
    • Message corruption was the cause of the server-to-server communication problems
    • Many messages on Sunday morning had a single bit corrupted
    • MD5 checksums are used in the system, but Amazon did not apply them to detect errors in this particular internal state
    • The corruption spread wrong states throughout the system and increased the system load
  • Preventing the problem
    • Change the gossip algorithm in order to control/reduce the amount of messages. Add rate limiters.
    • Put additional monitoring and alarming for gossip rates and failures
    • Add checksums to detect corruption of system state messages
  • Lessons learned
    • You get a big hammer … use it wisely!
    • Verify message and state correctness – all kind of corruption errors may occur
    • An emergency procedure to restore clear state in your system may be the solution of last resort. Make it work quickly !
    Lessons
    • Amazon’s the report for the incident http://status.aws.amazon.com/s3-20080720.html
    • Current status for Amazon services http:// status.aws.amazon.com /
  • Summary
    • Socket programming:
      • Message based, synchronous, non-persistent
    • Client-server infrastructures
      • RPC, RMI
      • Message based, synchronous or not, generally non-persistent
    • Data distribution:
      • multicast,
      • Epidemic algorithms