• Like
Operating System Support for Planetary Scale Network Service
Upcoming SlideShare
Loading in...5

Operating System Support for Planetary Scale Network Service

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • User-Mode Linux is a safe, secure way of running Linux versions and Linux processes. Run buggy software, experiment with new Linux kernels or distributions, and poke around in the internals of Linux, all without risking your main Linux setup.
  • 1: Linux-VServer provides virtualization for GNU/Linux systems. This is accomplished by kernel level isolation. It allows to run multiple virtual units at once. Those units are sufficiently isolated to guarantee the required security, but utilize available resources efficiently, as they run on the same kernel. 2: A chroot on Unix operating systems is an operation that changes the apparent disk root directory for the current running process and its children. A program that is re-rooted to another directory cannot access or name files outside that directory, called a chroot jail. The term "chroot" may refer to the chroot(2) system call or the chroot(8) wrapper program.
  • SILK stands for Scout In the Linux kernel and is a port of the scout operating system to run as a Linux kernel module. Scout is a modular, configurable, communication oriented operating system developed for small network appliances. Scout was designed around the needs of data-centric applications with particular attention given to networking.
  • COW: The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource. This fiction can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else.


  • 1. Operating System Support for Planetary Scale Network Service Andy Bavier, Larry Peterson
  • 2. PlanetLab
    • PlanetLab is a geographically distributed overlay network designed to support the deployment and evaluation of planetary-scale network services.
    • 580 machines spanning 275 sites and 30 countries nodes within a LAN-hop of > 2M users
    • Supports distributed virtualization each of 425 network services running in their own slice
    • PlanetLab services and applications run in a slice of the platform: a set of nodes on which the service receives a fraction of each node’s resources, in the form of a virtual
    • machine (VM)
  • 3. PlanetLab Denver Seattle Sunnyvale LA San Diego Chicago Pitts Wash DC Raleigh Jacksonville Atlanta KC Baton Rouge El Paso - Las Cruces Phoenix Pensacola Dallas San Ant. Houston Albuq . Tulsa New York Clev
  • 4. Slices
  • 5. Slices
  • 6. Slices
  • 7. Per-Node View VMM: Linux++ Node Mgr Local Admin VM 1 VM 2 VM n …
  • 8. Requirements of PlanetLab
    • Distributed Virtualization
      • Isolating Slices
      • Isolating PlanetLab
    • Unbundled Management
      • Minimize the functionality subsumed by the PlanetLab OS
      • Maximize the opportunity for services to compete with each other on a level playing field, the interface between the OS and these infrastructure services must be sharable, and hence, without special privilege.
  • 9. Isolating Slices
    • OS requirement from slices isolation
      • It must allocate and schedule node resources (cycles,bandwidth, memory, and storage) so that the runtime behavior of one slice on a node does not adversely affect the performance of another on the same node.
      • It must either partition or contextualize the available name spaces (network addresses, file names, etc.) to prevent a slice interfering with another, or gaining access to information in another slice.
      • It must provide a stable programming base that cannot be manipulated by code running in one slice in a way that negatively affects another slice.
  • 10. Isolating PlanetLab
    • It must thoroughly account resource usage , and make it possible to place limits on resource consumption so as to mitigate the damage a service can inflict on the Internet.
    • It must make it easy to audit resource usage, so that actions (rather than just resources) can be accounted to slices after the fact.
  • 11. PlanetLab OS
    • Node Virtualization
    • Isolation and Resource Allocation
    • Network Virtualization
    • Monitoring
  • 12. Node Virtualization
    • Low Level Virtualization
      • Full Hypervisors like VMware
        • Cons: Performance and Scalability
      • Paravirtualization like Xen
        • Scalability and immaturity
    • System call level virtualization, such as UML (User Mode Linux), Linux Vservers
      • Good performance
      • Reasonable assurance of isolation
  • 13. Node Virtualization
    • PlanetLab’s implementation
      • Linux Vserver which is a system call level virtualization. Vservers are the principal mechanism in PlanetLab for providing virtualization on a single node, and contextualization of name spaces; e.g.,user identifiers and files
      • Chroot utility, which is used to provide file system isolation. It extends the non-reversible isolation provided by chroot for filesystems to other operating system resources, such as processes and SysV IPC.
  • 14. Node Virtualization
    • Drawbacks
      • Weaker Guarantees on Isolation
      • Challenges for eliminating QoS crosstalk
  • 15. Isolation and Resource Allocation
    • The node manager provides a low-level interface for obtaining resources on a node and binding them to a local VM that belongs to some slice.
    • a bootstrap brokerage service running in a privileged slice implements the resource allocation policy.
    • Non-renewable resources, such as memory pages, disk space, and file descriptors, are isolated using per-slice reservations and limits.
    • For renewable resources such as CPU cycles and link bandwidth, the OS supports two approaches to providing
    • isolation: fairness and guarantees.
  • 16. Isolation and Resource Allocation
    • Fairness and guarantees
      • The Hierarchical Token Bucket (htb) queuing discipline of the Linux Traffic Control facility (tc) is used to cap the total outgoing bandwidth of a node, cap per-vserver output, and to provide bandwidth guarantees and fair service among vservers.
      • CPU scheduling is implemented by the SILK kernel module, which leverages Scout [28] to provide vservers with CPU guarantees and fairness.
      • PlanetLab’s CPU scheduler uses a proportional sharing (PS) scheduling policy to fairly share the CPU. It incorporates the resource container abstraction and maps each vserver onto a resource container that possesses some number of shares.
  • 17. Network Virtualization
    • The PlanetLab OS supports network virtualization by providing a “safe” version of Linux raw sockets that services can use to send and receive IP packets without root privileges.
    • It intercepts all incoming IP packets using Linux’s netfilter interface and demultiplexes each to a Linux socket or to a safe raw socket.
  • 18. Network Virtualization
    • Send
      • SILK intercepts the packet and does the security check
      • If the packet passes these checks, it is handed off to the Linux protocol stack via the standard raw socket sendmsg routine.
    • Receive
      • Those packets that demultiplex to a Linux socket are returned to Linux’s protocol stack for further processing;
      • those that demultiplex to a safe raw socket are placed directly in the per-socket queue maintained by SILK.
  • 19. Monitoring
    • Defined a low-level sensor interface for uniformly exporting data from the underlying OS and network, as well as from individual services.
    • Sensor semantics are divided into two types: snapshot and streaming.
      • Snapshot sensors maintain a finite size table of tuples, and immediately return the table (or some subset of it) when queried.
      • Streaming sensors follow an event model, and deliver their data asynchronously, a tuple at a time, as it becomes available .
  • 20. Evaluation
    • Vserver Scalabilty
      • The scalability of vservers is primarily determined by disk space for vserver root filesystems and service-specific storage. 508MB of disk is required for each VM. Copy on write technology is adopted by PlanetLab OS to reduce disk storage space
      • Kernel resource limits are a secondary factor in the scalability of vservers. Re-configure Kernel option and then recompile kernel is an option
  • 21. Evaluation
    • Slice Creation
      • the vserver creation and initialization takes an additional 9.66 seconds on average
    • Slice Intialization
      • it takes on average 11.2 seconds to perform an empty update on a node;
      • When a new Sophia “core” package is found and needs to be upgraded, the time increases to 25.9 seconds per node.
      • When run on 180 nodes, the average update time for the whole system (corresponding to the slowest node) is 228.0 seconds.
  • 22. Discussion
    • Instead of high level virtualization, can some low level virtualization be deployed in PlanetLab OS?
      • Performance?
      • Scalability?
      • Isolation?
      • Security?
      • Mature?
  • 23. Conclusion
    • PlanetLab OS supports distributed virtualization and unbundled management.
    • PlanetLab OS providing only local (per-node) abstractions
    • as much global (network-wide) functionality as possible pushed onto infrastructure services running in their own slices.
  • 24.
    • Q&A
  • 25. Scout & SILK
    • Scout is a modular, configurable, communication oriented operating system developed for small network appliances. Scout was designed around the needs of data-centric applications with particular attention given to networking.
    • SILK stands for Scout In the Linux kernel and is a port of the scout operating system to run as a Linux kernel module.
  • 26. Scout Design
    • Early Demultiplexing
      • This allows the system to isolate flows as early as possible, in order to prioritize packet processing and accurately account for resources.
    • Early Dropping
      • when flow queues are full. The server can avoid overload by dropping packets before investing many resources in them.
    • Accounting
      • Accounting of the resources used by each data flow, including CPU, memory, and bandwidth.
    • Explicit Scheduling
      • Scheduling and accounting are combined to provide resource guarantees to flows;
    • Extensibility
      • This makes it easy to add new protocols and construct new network services
  • 27. SILK Network Path
  • 28. QoS Crosstalk
    • Would like to “reserve” some amount of resource (e.g., CPU time) for each application so it can meet its QoS requirements.
    • Problem: Not all CPU time is properly accounted for!
      • e.g., When TCP stack is processing incoming packets, the CPU time is being used by the kernel, not any specific application
      • Same goes for handling file system requests, disk I/O, page faults, etc.
        • It is sometimes very hard to tell which app the OS is doing work for!
    • Result: Activity of one application can impact performance of another
      • This is called QoS Crosstalk
      • Applications are not properly isolated from one another
  • 29. Linux raw socket
    • The basic concept of low level sockets is to send a single packet at one time, with all the protocol headers filled in by the program (instead of the kernel).