Upcoming SlideShare
Loading in...5







Total Views
Slideshare-icon Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Spanning through…. 645 - 305 - 25 Consortium - Nodes - Users
  • PlanetLab nodes are simply machines connected to the internet --------- so security is a major issue
  • SLICE ------> a set of nodes on which a service receives a fraction of resources abstracted in the from of a Virtual Machine. Distributed Virtualization ------> acquisition of distributed set of VMs and treat it as a single system. Unbundled management ------> Deploy alternate services in parallel, user can pick any service. Same old distinguished problems -----> ISOLATION + FAIRNESS
  • Node manager - privileged root VM, monitors and manages all VMs running on that node. Allocate resources, VM creation, slice boot straping Local Admin - Infrastructure services - slice that monitors, create slice etc.. Some slices can be privileged, those running infrastructure services, that can make privileged calls to node manager All other slices that provide service to end user are unprivileged
  • Linux 2.4 series kernel with patches for vservers and hierarchical token bucket packet scheduler Reason for selecting linux vserver? But suppose you do not want to run many different Operating Systems simultaneously on a single box? Most applications running on a server do not require hardware access or kernel level code, and could easily share a machine with others, if they could be separated and secured...
  • Same OS as that of host, 32bit - 64 bit (host) compatibility Modified chroot to make it more secure. Per vserver /etc/passwd and runs sshd on a separate TCP port Each vserver has two set of persistent information - ssh keys - that allows only the owners of the slice to ssh into this vserver - rc.vinit file - boot script that executes on each vserver boot Once you have found that a project is using more resource than expected, you can move it to another box without having to fiddle here and there. A vserver is just a directory inside the host server. You tar it and copy it to another box and restart it there.
  • SLICES share port numbers and addresses of a single node Have lot of experiments to show the use of unification
  • Unification Share files between different context Reduce overall resource usage Hard linked immutable un-linkable files
  • Every data structure now need to be augmented with the context, this when used with uid differentiates identical uid between the virtual servers. New system call to assign security context to process Default host context Spectator context to overview all other contexts.
  • Mirror - Immutable invert file system bits Login account name as slice name one in node’s primary vserver and other in the vserver that is newly created The Default shell in primary vserver is changed as “vsh” Makes effective use of linux capabilities /usr/include/linux/capabilities.h, resource limits can be found in /usr/include/asm/resource.h Next >>>> Earlier poor resource isolation Single vserver consumed all the file descriptors available, no bounds on CPU usage
  • PCR - reduce cross talk (contention for a shared resource, mainly server process) among applications requiring soft real time guarantees mainly multimedia applications requiring soft real time guarantees Nemesis - Restructuring OS to accurately account for resources - eliminate server process from the data path Silk - scout in linux kernel module that provides CPU scheduling, network accounting and safe raw sockets
  • Policy - impose bounds on network traffic from that node etc. Rcap - resource capability for a node, 128 bit capability Rspecs - resource requirement for the node Rspecs in detail: Each rspec consists of a list of reservations for physical resources (e.g., CPU cycles, link bandwidth, disk capacity), limits on logical resource usage (e.g., file descriptors), assignments of shared name spaces (e.g., TCP and UDP port numbers), and other slice privileges (e.g., the right to create a virtual machine on the node). The rspec also specifies the start and end times of the interval over which these values apply. PLC supports two additional brokerage services Emulab and SHARP
  • Node manager checks resource limits on each system call Fairness - among N slices, each slice receive no less than 1/n of available resources Guarantees - provide a slice with a reserved amount of resource ( eg. 1Mbps of link bandwidth ) Htb: Root token bucket - with maximum resource (by node admin), allocated to child buckets Packets sent by a vserver are tagged in the kernel and accounted to appropriate child bucket (similar to resource containers) Appropriate classification of packets by kernel (say ping packets and IP options packet ) and accounting to appropriate bucket for fine grained accounting. PS: Resource containers for each vserver, process spawned by vserver’s resource added to the appropriate vserver’s container We make sure that on each allocation the limit is not exceeded Proportionally share the excess
  • consumer socket - communication end point Node manager maintains mapping between vservers and their port reservation Safe raw sockets - each raw socket is bound to a particular TCP or UDP port Outgoing packets are filtered to ensure that local address matches the binding of the socket Sniffer socket - snoops IP datagrams sent and received on that port, this socket receives header copies of packets sent and received on that port Implements promiscuous mode of operation Read only .. Do not interfere with traffic Setsockopt() - for providing these additional capabilities over the raw socket (sniffer, send IP packets etc..) To check for collision: SILK wraps bind connect and other calls and do appropriate checks
  • Snapshot sensor like a RRD (finite data returned) Streaming sensor (continuously send data as and when the data becomes available) Sensor server exports sensor interface that periodically collects information form the node and sends it to the central server Simple (reading form /proc) to complex information
  • Vserver root filesystem with 1408 directories (6MB) and 28003 files - 508 MB COW expect on /etc(6MB) and /var (17MB) So require only 29 MB space 1000 vservers on a single planet lab node

planetLab.ppt planetLab.ppt Presentation Transcript

  • PlanetLab Operating System support* *a work in progress
  • What is it? A Distributed set of machines that must be shared in an efficient way.. Where “efficient” can mean a varied “lot”..
  • Goals PlanetLab account, together with associated resources should span through multiple nodes. (SLICE) Distributed Virtualization Unbundled management Infrastructure services (running a platform as opposed to running an application) over a SLICE providing variety of services for the same functionality.
  • Design
  • 4 main areas..
    • VM Abstraction - Linux vserver
    • Resource Allocation + Isolation - SCOUT
    • Network virtualization
    • Distributed Monitoring
    • Full virtualization like Vmware - performance, lot of memory consumed by each memory image
    • Para virtualization like xen - more efficient, a promising solution (but still has memory constraints)
    • Virtualize at system call level like Linux vservers, UML - support large number of slices with reasonable isolation
    “ Node Virtualization”
  • OS for each VM ?
    • Linux vservers - linux inside linux
    • Each vserver is a directory in a chroot jail.
    • Each virtual server,
      • share binaries
      • has its own packages,
      • has its own services,
      • is a weaker form of root that provides a local super user,
      • has its own users, i.e own GID/UID namespace
      • is confined to using some IP numbers only and,
      • is confined to some area(s) of the file system.
  • Communication among ‘vservers’
    • Not local sockets or IPC
    • but via IP
      • Simplifies resource management and isolation
      • Interaction is independent of their locations
  • Reduced resource usage
    • Physical memory
      • Copy of write memory segments across unrelated servers
    • Unification (Disk space)
      • Share files across contexts
      • Hard linked immutable un-linkable files
  • Required modifications for vserver
    • Notion of context
      • Isolate group of processes,
      • Each vserver is a separate context,
      • Add context id to all inodes,
      • Context specific capabilities were added,
      • Context limits can be specified,
      • Easy accounting for each contexts.
  • vserver implementation
    • Initialize vserver
      • Create a mirror of reference root file system
      • Create two identical login account
    • Switching from default shell (modified shell)
      • Switch to the Slice's vserver security context
      • Chroot to vserver’s root file system
      • Relinquish subset of true super user privileges
      • Redirect into other account in that vserver
  • “ Isolation & Resource Allocation”
    • KeyKOS - strict resource accounting
    • Processor Capacity Reserves
    • Nemesis
    • Scout - scheduling along data paths (SILK)
  • Overall structuring
    • Central infrastructure services ( Planet Lab Central )
      • central database of principles, slices, resource allocation and policies
      • Creation, deletion of slices through exported interface
    • Node manager
      • Obtains resource information from central server
      • Bind resources to local VM that belongs to a slice
        • Rcap -> acquire( Rspecs )
        • Bind( slice_id, Rcap )
    ** Every resource accesses goes through the node manager as system call and validated using Rcap
  • Implementation
    • Non renewable resources
      • Disk space, memory pages, file descriptor
      • Appropriate system calls wrapped to check with per slice resource limits, increment usage.
    • Renewable resources
      • Fairness and guarantees
    • Hierarchical token bucket queuing discipline
      • Cap per-vserver total outgoing bandwidth
    • SILK for CPU scheduling
      • Proportional share scheduling using resource containers
  • “Network virtualization”
    • Filters on network send and receive - like Exokernel and Nemesis.
    • Sharing and partitioning a single network address space - by using a safe version of raw sockets.
    • Alternative approach (similar to xen) - Assign different IP address to each VM, each using the entire port space and manage its own routing table. The problem is unavailability of enough IPV4 addresses in the order of 1000 per node.
  • Safe raw sockets
    • The Scout module manages all TCP and UDP ports and ICMP IDs to ensure that there are no collisions between safe raw sockets and TCP/UDP/ICMP sockets
    • For each IP address, all ports are either free or "owned" by a slice.
    • Two slices may split ownership of a port by binding it to different IP addresses.
    • Only two IP addresses for a node as of now.. External IP + loop back address
    • SLICE can reserve port as any other resource (Xclusive)
    • SLICE can open 3 sockets on a port
      • Error socket, consumer socket, sniffer socket
  • Monitoring
    • Http Sensor server collects data from sensor interface on each nodes.
    • Clients can query form the sensor database
  • Scalability
    • Limited by disk space
    • Of course limited by kernel resources
      • Need to recompile to increase resources
    • Thank you..