Your SlideShare is downloading. ×
0
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

* Distributed System Lab 1 SnowFlock: Rapid Virtual Machine ...

960

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
960
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Asdfdffef Wef Wef We F F Ew Fewf Ewfewfewfwf We F Wef We F ewf
  • Transcript

    • 1. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing 03/17/10 Distributed System Lab H. Andrés Lagar-Cavilla, Joseph A. Whitney, Adin Scannell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, M. Satyanarayanan University of Toronto and Carnegie Mellon University http://sysweb.cs.toronto.edu/snowflock 游清權
    • 2. Outline
      • Introduction
      • VM Fork
      • Design Rationale
      • SnowFlock Implementation
      • Application Evaluation
      • Conclusion and Future Directions
      03/17/10 Distributed System Lab
    • 3. Introduction
      • VM technology is widely adopted as an enabler of cloud computing
      • Benefits
        • Security
        • Performance isolation
        • Ease of management
        • Flexibility (user-customized environment )
        • Use a variable number of physical machines and VM instances depending on the needs of the problem
        • EX. Task may need only a single CPU during some phases of execution
      03/17/10 Distributed System Lab
    • 4. Introduction
      • Introduce VM fork
        • Simplifies development and deployment of cloud applications .
        • Allows for the rapid (< 1 second) instantiation of stateful computing elements in a cloud environment
      • VM fork is similar to the process fork
        • Child VMs receive a copy of all of the state generated by the parent VM prior to forking
        • Different in three fundamental ways .
      03/17/10 Distributed System Lab
    • 5. Introduction
      • VM fork primitive allows for the forked copies to be instantiated on a set of different physical machines
        • Enabling the task to take advantage of large compute clusters.
        • Previous work [Vrable 2005] is limited to cloning VMs within the same host
      • Have made our primitive parallel , enabling the creation of multiple child VMs with a single call
      • VM fork replicates all of the processes and threads of the originating VM
        • (Eables effective replication of multiple cooperating processes )
      • E.g. A customized LAMP Linux/Apache/MySql/Php) stack
      03/17/10 Distributed System Lab
    • 6. Introduction
      • Enables the trivial implementation of several useful and well-known patterns that are based on stateful replication
      • Pseudocode for four of these is illustrated in Figure 1
      • Sandboxing of untrusted code
      • Instantiating new worker nodes to handle increased load (e.g. due to flash crowds)
      • Enabling parallel computation
      • Opportunistically utilizing unused cycles with short tasks
      03/17/10 Distributed System Lab
    • 7. Introduction
      • SnowFlock
        • Provides swift parallel stateful VM cloning with little runtime overhead and frugal consumption of cloud I/O resources
      • Takes advantage of several key techniques .
        • First
        • SnowFlock utilizes lazy state replication to minimize the amount of state propagated to the child VMs.
        • Extremely fast instantiation of clones by initially copying the minimal necessary VM data , and transmitting only the fraction of the parent’s state that clones actually need .
      03/17/10 Distributed System Lab
    • 8. Introduction
      • Takes advantage of several key techniques .
        • Second
        • A set of avoidance heuristics eliminate substantial superfluous memory transfers for the common case of clones allocating new private state .
        • Finally
        • Child VMs execute
          • Very similar code paths
          • Access common data structures
        •  Use a Multicast distribution technique for VM state that provides scalability and prefetching
      03/17/10 Distributed System Lab
    • 9. Introduction
      • Evaluated SnowFlock by focusing on a demanding instance of Figure 1 (b)
      • (Interactive parallel computation)
      • Conducted experiments with applications from
        • Bioinformatics
        • Quantitative finance
        • Rendering
        • Parallel compilation
      • Be deployed as Internet services .
      • 128 processors
      • SnowFlock achieves speedups coming within 7% or better of optimal execution
      03/17/10 Distributed System Lab
    • 10. VM Fork
      • Advs
        • Execute independently on different physical hosts
        • Isolation
        • Ease of software development associated with VMs .
        • Greatly reducing the performance overhead of creating a collection of identical VMs on a number of physical machines.
      • Each forked VMs proceeds with an identical view of the system
        • Save for a unique identifier (vmid) .
        • (be distinguished , parent or ..)
      • Each forked VM has its own independent copy (OS ,and virtual disk )
        • State updates are not propagated between VMs
      03/17/10 Distributed System Lab
    • 11. VM Fork
      • Forked VMs are transient entities
        • Memory image and virtual disk are discarded once they exit
      • Any application-specific state or values generate explicitly communicated to the parent VM .
        • (message passing or via a distributed file system )
      • Conflicts may arise
        • (multiple processes within the same VM simultaneously invoke VM forking)
        • .
      • VM fork will be used in VMs that have been carefully customized to run a single application (like serving a web page).
      03/17/10 Distributed System Lab
    • 12. VM Fork
      • The semantics of VM fork
        • Integration with a dedicated, isolated virtual network connecting child VMs with their parent.
        • Each child is configured with a new IP address based on its vmid , and it is placed on the same virtual subnet .
        • Child VMs cannot communicate with hosts outside this virtual network
      03/17/10 Distributed System Lab
    • 13. VM Fork
      • User must be conscious of the IP re­configuration semantics:
      • -Network shares must be (re)mounted after cloning .
      • 2. Provide a NAT layer to allow the clones to connect to certain external IP
        • NAT performs firewalling and throttling
        • Only allows external inbound connections to the parent VM
        • Useful to implement web-based fron­tend ,
        • Allow access to a dataset provided by another party
      03/17/10 Distributed System Lab
    • 14. Design Rationale
      • Plotting the cost of suspending and resuming a 1GB VM to an increasing number of hosts over NFS (see Section 5 for details on the testbed)
      • Direct relationship between I/O involved and fork latency, with latency growing to the order of hundreds of seconds
      03/17/10 Distributed System Lab
    • 15. Design Rationale
      • Method 1 :
      • Implement VM fork using existing VM suspend/resume
        • The whole­sale copying of a VM to multiple hosts is far too taxing
        • Decreases overall system scalability
          • (by clogging the network with gigabytes of data .)
      • Contention caused by the simultaneous requests by all children turns the source host into a hot spot .
        • Live migration [Clark 2005, VMotion]
        • A popular mechanism for consolidating VMs in clouds [Steinder 2007, Wood 2007], ( Same algorithm plus extra rounds of copying, taking longer to replicate VMs )
      03/17/10 Distributed System Lab
    • 16. Design Rationale
      • Method 2
      • Solving the problem of VM fork latency uses our multicast library .
        • Multicast delivers state simultaneously to all hosts
        • Overhead is still in the range of minutes
        • ( substantially reduce the total amount of VM state pushed over the network. )
      • Fast VM fork implementation is based on
        • Start executing child VM on a remote site by initially replicating only minimal state
        • Children will typically access only a fraction of the original memory image (parent)
        • It’s common for children to allocate memory after forking
        • Children often execute similar code and access common data structures.
      03/17/10 Distributed System Lab
    • 17. Design Rationale
      • VM Descriptors
        • Lightweight mechanism ,Instantiates a new forked VM with only the critical metadata needed to start execution on a remote site.
      • Memory-On-Demand
        • Mechanism whereby clones lazily fetch portions of VM state over the network as it is accessed .
      • Experience
        • Possible to start a child VM by shipping only 0.1% state of the parent .
        • Children require a fraction of the original memory image of the parent.
        • Read portions of a remote dataset or allocate local storage
        • Optimization can reduce communication . 1GB  40MBs 4%! (for application footprints )
      03/17/10 Distributed System Lab
    • 18. Design Rationale
      • Memory on-demand : non-intrusive approach (Reduces state transfer without altering the behavior of the guest OS).
      • Another non-intrusive approach :
      • Copy-on-write , by Potemkin [Vrable 2005]. (Same Host )
      • Potemkin does NOT provide runtime stateful cloning , since all new VMs are copies of a frozen template
      • Multicast replies to memory page requests
        • High correlation across memory accesses of the children (insight iv)
        • Prevent the parent from becoming a hot-spot
      • Multicast provides Scalability and Prefetching.
      • Children operate independently and individually
      • A Child waiting for a page does not prevent others from making progress.
      03/17/10 Distributed System Lab
    • 19. SnowFlock Implementation
      • SnowFlock is an open-source project (on the Xen 3.0.3 VMM)
      • Xen
        • Hypervisor running at the highest processor privilege level .Controlling the execution of domains (VMs) ,The domain kernels are paravirtualized .
      • SnowFlock
        • Modifications to the Xen VMM and daemons (In domain0).
        • Daemons form a distributed system that controls the life-cycle of VMs (cloning and deallocation )
        • Policy decisions:
          • Resource accounting
          • Allocation of VMs to physical hosts
          • (To suitable cluster management software via a plug-in architecture )
        • Lazy state replication ( avoidance heuristics to minimize state transfer )
      03/17/10 Distributed System Lab
    • 20. SnowFlock Implementation
      • Four mechanisms to fork a VM .
        • Parent VM is temporarily suspended ,produce a VM descriptor
          • A small file ( VM metadata and guest kernel memory management data )
          • Distributed to other physical hosts to spawn new VMs
          • In subsecond time
        • Memory-on-demand mechanism .
          • Lazily fetches additional VM memory state
        • The avoidance heuristics .
          • Reduce the amount of memory that needs to be fetched on demand
        • Multicast distribution system mcdist
          • Deliver VM state simultaneously and efficiently
          • Providing implicit prefetching
      03/17/10 Distributed System Lab
    • 21. Implementation- 1.API
      • VM fork in Snow-Flock consists of two stages
      • sf_request_ticket ( Reservation for the desired number of clones)
      • 1. To optimize for common use cases in SMP hardware
        • Cloned VMs span multiple hosts.
        • The processes within each VM span the physical underlying cores
        • Due to user quotas , current load , and other policies, the cluster management system may allocate fewer VMs than requested
      • 2. Fork the VM across the hosts with the sf_clone call.
        • Child VM finishes its part of the computation, then sf_exit
        • A parent VM can wait for its children to terminate with sf_join
        • Force their termination with sf_kill ...
      03/17/10 Distributed System Lab
    • 22. Implementation- 1.API
      • API is simple and flexible (modification of existing code bases)
      • Widely used Message Passing Interface (MPI) library
        • Allows unmodified parallel applications to use SnowFlock’s capabilities
      03/17/10 Distributed System Lab
    • 23. Implementation-2.VM Descriptors
      • Condensed VMimage
        • Swift VM replication to a separate physical host.
      • Starts by spawning a thread in the VM kernel that quiesces its I/O devices.
        • Deactivates all but one of the virtual processors (VCPUs),
        • Issues a hypercall suspending the VM’s execution
      • When Hypercall succeeds.
        • Maps the suspended VM memory to populate the descriptor.
      • Descriptor contains:
        • Metadata describing the VM and its virtual devices
        • Few memory pages shared between the VM and the Xen hypervisor.
        • Registers of the main VCPU,
        • Global Descriptor Tables (GDT) used by the x86 segmentation hardware for memory protection
        • Page tables of the VM.
      03/17/10 Distributed System Lab
    • 24. Implementation-2.VM Descriptors
      • The page tables make up the bulk of a VM descriptor.
        • Each process in the VM needs a small number of additional page tables .
        • The cumulative size of a VM descriptor is thus loosely dependent on the number of processes the VM is executing.
      • Entries in a page table are “canonicalized” before saving .
      • Translated from references to host-specific pages to frame numbers within the VM’s private contiguous physical space
        • (“machine” and “physical” addresses in Xen parlance ,respectively).
      • A few other values included in the descriptor, e.g. the cr3 register of the saved VCPU (also canonicalized ) .
      03/17/10 Distributed System Lab
    • 25. Implementation-2.VM Descriptors
      • Descriptor is multicast to multiple physical hosts (mcdist) Section 4.5
      • Metadata is used to allocate a VM with the appropriate virtual devices and memory footprint .
      • All state saved in the descriptor is loaded:
        • Pages shared with Xen
        • Segment descriptors,
        • Page tables
        • VCPU registers.
      • Physical addresses in page table entries are translated to use the new mapping between VM specific physical addresses and host machine addresses .
      • The VM replica resumes execution , enables the extra VCPUs , and reconnects its virtual I/O devices to the new frontends.
      03/17/10 Distributed System Lab
    • 26. Implementation-2.VM Descriptors
      • Evaluation
      • Time spent replicating a single-processor VM with 1 GB of RAM to n clones in n physical hosts
      03/17/10 Distributed System Lab
    • 27. Implementation-2.VM Descriptors
      • VM descriptor for experiments was 1051 ± 7 KB.
      • The time to create a descriptor =
        • “ Save Time” (our code) +“Xend Save” (Recycled and Unmodified Xen code).
      • “ Starting Clones” time distributing the order to spawn a clone to each host
      • Clone creation in each host is composed by “Fetch Descriptor” (wait for the descriptor to arrive),..
      • “ Restore Time” (our code)
      • “ Xend Restore” (recycled Xen code).
      • Overall, VM replication is a fast operation. ( 600 to 800 milliseconds)
      • Replication time is largely independent of the number of clones created
      • ??
      03/17/10 Distributed System Lab
    • 28. Implementation- 3.Memory-On-Demand
      • SnowFlock’s memory-on-demand subsystem - memtap
          • After being instantiated from a descriptor
          • Find it is missing state needed to proceed.
          • Memtap, handles by lazily populating the clone VM’s memory with state fetched from the parent (Immutable copy of the VM’s memory)
      • memtap  Hypervisor logic + Userspace domain0 process (associated with the clone VM)
        • 1.Missing page
        • 2.Pauses that VCPU
        • 3.Notifies the memtap process
        • 4. Fetches its contents from the parent
        • 5. Notifies the hypervisor  VCPU may be unpaused.
      03/17/10 Distributed System Lab
    • 29. Implementation- 3.Memory-On-Demand
      • To allow hypervisor trap memory accesses to pages (Not yet been fetched)
      • Use Xen shadow page tables..
        • The x86 register (cr3)
        • Replace pointer to initially empty page table
        • Shadow page table is filled on demand from the real page table (Faults on empty entries occur)
      • If first access to a page not yet been fetched.
        • Hypervisor notifies memtap
        • Fetches are also triggered
        • Accesses by domain0 of the VM’s memory for the purpose of virtual device DMA. …..??
      03/17/10 Distributed System Lab
    • 30. Implementation- 3.Memory-On-Demand
      • On parent VM
        • Memtap - Implements copy-on-write
      • Use shadow page tables in “log-dirty” mode .
        • All parent VM memory write attempts are trapped by disabling the writable bit on shadow page table.
        • Hypervisor duplicates the page and patches the mapping of the memtap server process to point to the duplicate.
        • Parent VM is then allowed to continue execution. .
      03/17/10 Distributed System Lab
    • 31. Implementation- 3.Memory-On-Demand
      • To understand the overhead involved -- microbenchmark
        • Multiple microbenchmark runs ten thousand page fetches ,Figure 4(a).
        • split a page fetch operation into six components.
      • Six components.
      • “ Page Fault” . Hardware page fault overheads
          • (cause by using shadow page tables) ..
      • “ Xen” . Xen hypervisor shadow page table logic.
      • “ HV Logic” Hypervisor logic:
      • “ Dom0 Switch” Context switch to the domain0 ( memtap ).
      • “ Memtap Logic” memtap internals, Mapping the faulting VM page.
      • “ Network” software (libc and Linux kernel TCP stack) hardware overheads
      03/17/10 Distributed System Lab Evaluation
    • 32. Implementation- 4.Avoidance Heuristics
      • Fetching pages from the parent still incurs an overhead (May prove excessive for many workloads)
      • Augmented the VM kernel with two fetch-avoidance heuristics .
        • Bypass lots unnecessary memory fetches ,retaining correctness.
      • First heuristic
        • Optimizes the general case in which a clone VM allocates new state.
        • Intercepts pages (selected by kernel’s page allocator)
        • The kernel page allocator is invoked when more memory is needed
        • The recipient of the selected pages does not care about the pages’ previous contents
        • (… page 6 , Right)
      03/17/10 Distributed System Lab
    • 33. Implementation- 4.Avoidance Heuristics
      • The second heuristic.
        • Addresses the case where a virtual I/O device writes to the guest memory .
        • Consider Block I/O:
        • Target page is typically a kernel buffer that is being recycled and whose previous contents do not need to be preserved.
        • Again, there is no need to fetch this page
      • Fetch-avoidance heuristics
        • Implemented by mapping the memtap bitmap in the guest kernel’s address space.
      03/17/10 Distributed System Lab
    • 34. Implementation- 4.Avoidance Heuristics
      • Evaluation
      • Result in substantial benefits.
        • Runtime and data transfer
      • With the heuristics
        • State transmissions to clones are reduced to 40 MBs,
        • Tiny fraction (3.5%) of the VM’s footprint..
      03/17/10 Distributed System Lab
    • 35. Implementation- 5.Multicast Distribution
      • Mcdist
        • Multicast distribution system efficiently provides data to all cloned VMs simultaneously.
      • Two goals (Not served by point-to-point ).
      • First: Data needed by clones is often prefetched.
        • Single clone requests a page
        • Response also reaches all other clones.
      • Second: Load (network) is greatly reduced
        • Sending a piece of data to all VM clones (1 operation).
      03/17/10 Distributed System Lab
    • 36. Implementation- 5.Multicast Distribution
      • Mcdist server design is minimalistic
        • Only switch programming and flow control logic.
      • Ensuring Reliability - Timeout mechanism .
      • IP-multicast : Send data to multiple hosts simultaneously.
        • Supported by most off-the-shelf commercial Ethernet hardware .
      • IP-multicast hardware
        • Capable of scaling to thousands of hosts and multicast groups
        • Automatically relaying multicast frames across multiple hops.
      03/17/10 Distributed System Lab
    • 37. Implementation- 5.Multicast Distribution
      • Mcdist clients are memtap processes
        • Receive pages asynchronously and unpredictably in response to requests by fellow VM clones.
        • Memtap clients batch received pages until
          • A threshold is hit
          • Page that has been explicitly requested arrives.
      • Single hypercall is invoked to map the pages in a batch.
      • Threshold of 1024 pages has proven to work well in practice.
      03/17/10 Distributed System Lab
    • 38. Implementation- 5.Multicast Distribution
      • To maximize total goodput
        • Server uses flow control logic  Limitits sending rate
        • Server and clients estimate their send and receive rate
      • Clients
        • Provide explicit feedback
      • The server increases its rate limit linearly
        • Loss is detected  The server scales its rate limit back.
      • Other Server flow control mechanism - lockstep detection
        • Multiple requests for the same page  Ignores duplicate requests
      03/17/10 Distributed System Lab
    • 39. Implementation- 5.Multicast Distribution
      • Evaluation
      • Results obtained with SHRiMP.
      • Shows that multicast distribution’s lockstep avoidance works effectively:
        • Lockstep-executing VMs issue simultaneous requests that are satisfied by a single response from the server.
        • Hence the difference between the “Requests” and “Served” bars in the multicast experiments.
      03/17/10 Distributed System Lab
    • 40. Implementation- 5.Multicast Distribution 03/17/10 Distributed System Lab
    • 41. Implementation- 5.Multicast Distribution
      • Figure 4(c) shows the benefit of mcdist for a case where an important portion of memory state is needed after cloning
        • (The avoidance heuristics cannot help. )
      • Experiment (NCBI BLAST)
        • Executes queries against a 256 MB portion of the NCBI genome database that the parent caches into memory before cloning.
      • Speedup results for SnowFlock: unicast VS multicast,
      • Idealized zero-cost fork configuration
        • VMs have been previously allocated,
        • with no cloning or state-fetching overhead.
      03/17/10 Distributed System Lab
    • 42. 6.Virtual I/O Devices -- Virtual Disk
      • Implemented with a blocktap [Warfield 2005] driver .
        • Multiple views of the virtual disk are supported by a hierarchy of copy-on-write(COW) slices located at the site where the parent VM runs.
      • Each fork operation adds a new COW slice
        • Rendering the previous state of the disk immutable.
      • Children access a sparse local version of the disk,
        • Fetched on demand from the disk server.
      • Virtual disk exploits same optimizations (memory subsystem)
        • Unnecessary fetches during writes are avoided using heuristics ,
        • Original disk state is provided to all clients simultaneously via multicast .
      03/17/10 Distributed System Lab
    • 43. 6.Virtual I/O Devices -- Virtual Disk
      • Virtual disk is used as the base root partition for the VMs.
      • For data-intensive tasks
        • Envision serving data volumes to the clones through network file systems such as NFS
        • Suitable big-data filesystems such as Hadoop or Lustre [Braam 2002].
      • Most work done by clones is processor intensive
        • Writes do not result in fetches
        • The little remaining disk activity mostly hits kernel caches .
      • Largely exceeds the demands of many realistic tasks
        • Not cause any noticeable overhead for the experiments (Section 5).
      03/17/10 Distributed System Lab
    • 44. 6.Virtual I/O Devices -- Network Isolation
      • Employ a mechanism to isolate (prevent interference, eavesdropping).
      • Performed at the level of Ethernet packets, the primitive exposed by Xen virtual network devices.
      • Before being sent
        • Source MAC addresses of packets sent by a SnowFlock VM are rewritten as a special address which is a function of both the parent and child identifiers .
        • Simple filtering rules are used by all hosts to ensure that no packets delivered to a VM come from VMs that are not its parent or a sibling.
      • A packet is delivered
        • Destination MAC address is rewritten to be as expected, rendering the entire process transparent.
      • Small number of special rewriting rules are required for protocols with payloads containing MAC addresses, such as ARP.
      • Filtering and rewriting impose an imperceptible overhead while maintaining full IP compatibility.
      03/17/10 Distributed System Lab
    • 45. Application Evaluation
      • Focuses on a particularly demanding scenario
        • The ability to deliver interactive parallel computation ,
        • VM forks multiple workers to participate in a short-lived computationally-intensive parallel job.
      • Scenario
        • Users interact with a web frontend and submit queries
        • Parallel algorithm run on a compute cluster
      • Cluster of 32 Dell PowerEdge 1950 blade servers .
      03/17/10 Distributed System Lab
    • 46. Application Evaluation
      • Each host
        • 4 GB of RAM
        • 4 Intel Xeon 3.2 GHz cores
        • Broadcom NetX­treme II BCM5708 gigabit NIC
      • All machines running SnowFlock prototype (Xen 3.0.3) .
      • Para-virtualized Linux 2.6.16.29 (Guest, Host)
      • All machines were connected to two daisy-chained Dell PowerConnect 5324 gigabit switches
      03/17/10 Distributed System Lab
    • 47. Applications
      • 3 typical applications from bioinformatics
      • 3 applications
        • Graphics rendering
        • Parallel compilation
        • Financial services
      • Driven by a workflow shell script (clones VM and launches application ).
      • NCBI BLAST  Computational tool used by biologists.
      • SHRiMP  Tool for aligning large collections of very short DNA sequences
      • ClustalW  multiple alignment of a collection of protein or DNA sequences
      • QuantLib  Toolkit widely used in quantitative finance
      • Aqsis – Renderman  In films and television visual effects [Pixar]
      • Distc  parallel compilation.
      03/17/10 Distributed System Lab
    • 48. Results
      • 32 4-core SMP VMs on 32 physical hosts
      • A im to answer the following questions
        • How does SnowFlock compare to other methods for instantiating VMs?
        • How close does SnowFlock come to achieving optimal application speedup?
        • How scalable is SnowFlock?
      03/17/10 Distributed System Lab
    • 49. Results - Comparison
      • SHRIMP , 128 processors under three configurations
        • SnowFlock with all the mechanisms
        • Xen’s standard Suspend/Resume that use NFS
        • Multicast to distribute the suspended VM image
      03/17/10 Distributed System Lab
    • 50. Results - Application Performance 03/17/10 Distributed System Lab
    • 51. Results - Application Performance
      • Compares SnowFlock to an optimal “zero-cost fork” baseline
      • Baseline
        • 128 threads to measure overhead
        • one thread to measure speedup
      • Zero-cost
        • VMs previously allocated,
        • No cloning or state-fetching overhead
        • In an idle state .
        • Overly optimistic
        • Not representative of cloud computing environments
      • zero-cost VMs
        • Vanilla Xen 3.0.3 domains ,configured identically to SnowFlock VMs
      03/17/10 Distributed System Lab
    • 52. Results - Application Performance
      • Extremely well
      • Reducing execution time
        • Hours  Tens of seconds (for all the benchmarks).
      • Speedups
        • Very close to the zero-cost optimal ,
        • Comes within 7% of the optimal runtime .
      • Overhead ( VM replication , on-demand state fetching )  Small .
      • ClustalW The best results
        • Less than 2 seconds of overhead for a 25 second task .
      03/17/10 Distributed System Lab
    • 53. Scale and Agility
      • Address SnowFlock’s capability (support multiple concurrent forking VMs)
        • Launch four VMs that each forks 32 uniprocessor VMs.
      • After completing a parallel task, each parent VM joins and terminates its children . (Than launches another parallel task , repeating five times)
      • Each parent VM runs a different application.
        • Employed an “ adversarial allocation ” in which each task uses 32 processors, one per physical host
        • 128 SnowFlock VMs are active at most times.
        • Each physical host needs to fetch state from four parent VMs
      03/17/10 Distributed System Lab
    • 54. Scale and Agility
      • SnowFlock is capable of withstanding the increased demands of multiple concurrent forking VMs
      • Believe : Optimizing mcdist , consistent running times ↓
      • Perform a 32-host 40-seconds or less parallel computation, with five seconds or less of overhead
      03/17/10 Distributed System Lab
    • 55. Conclusion and Future Directions
      • Introduced VM fork and SnowFlock , Xen-based implementation
      • VM fork :
        • I nstantiate dozens of VMs in different hosts in sub-second time, runtime overhead ↓ , cloud IO resources↓
      • SnowFlock
        • Drastically reduce the time (copying only the critical state)
        • Fetching the VM’s memory image efficiently on-demand …
      • Simple modifications guest kernel ( reduce network traffic)
        • Eliminating the transfer of pages that will be overwritten.
      • Multicast (locality of memory accesses across cloned VMs)
        • Low cost.
      03/17/10 Distributed System Lab
    • 56. Conclusion and Future Directions
      • SnowFlock is an active open-source project
        • Plans involve adapting SnowFlock to bigdata applications.
      • Fertile research ground studying the interactions of VM fork with data parallel APIs.
      • SnowFlock’s objective: performance > reliability.
        • Memory-on-demand provides performance
        • (Dependency on a single source of VM state)
        • How to push VM state in background without sacrificing
      • Wish : wide-area VM migration .
      03/17/10 Distributed System Lab

    ×