An operating system for multicore and clouds: mechanism and implementation
1. An Operating System for Multicore and Clouds :
Mechanisms and Implementation
Authors :
David Wentzlaff, Charles Gruenwald, Nathan Beckmann, Kevin Modzelewski, Adam
Belay, Lamia Youseff, Jason Miller & Anant Agarwal
Computer Science and Artificial Intelligence Laboratory, MIT
Presenting By : Mohanadarshan (148241N) & Ireshika (138214C)
2. Content
â—Ź Motivation
â—Ź Problem
â—Ź Challenges in Muticore and Cloud systems.
â—Ź Solution
â—Ź Factored Operating Systems
â—Ź How FOS solves the problem
â—Ź Case Studies
â—Ź Results
â—Ź Conclusion
4. Problem
Multicore and Cloud systems cannot use existing operating systems.
â—Ź They cannot scale much.
â—Ź It put more responsibility on user to manage the system configurations
and resources.
â—Ź They does not take full advantage of the increased computational
capacity.
â—Ź Existing OS are designed targeting on machines which has smaller
number of cores.
â—Ź Inefficiency in managing faults in large scale systems (with many cores)
5. Challenges in Multicore & Cloud Systems
â—Ź Scalability
â—Ź Variability of Demand
â—Ź Faults
â—Ź Programming Challenges
6. Scalability
â—Ź Existing OSs are designed for single processor or to some limited
processors.
â—Ź There are many scalability limitation exists
âž” Limitations in locking
âž” Locality aliasing
âž” Reliance on shared memory.
â—Ź Cloud resources are virtually unlimited for a given user.
7. Variability of Demand
â—Ź Os needs to manage the live cores to match the demand. but existing OSs
only manage the single core (active or idle).
â—Ź Cloud computing makes more resources available on-demand (since user
expectation can change in run time)
â—Ź Demand is not static, it is dynamic.
8. Faults
â—Ź Hardware faults are more common in multicore or cloud computing
system, need to manage them
â—Ź System software (OS) must gracefully support dying cores and bit flips.
â—Ź Lack of tools to debug and detect faults in multicore or cloud systems.
9. Programming Challenges
â—Ź Uniprocessor OSs are works in multiprocessor system by adding locks to
OS data structure.
âž” Choosing correct lock granularity
âž” deadlock prevention
â—Ź Efficient large-scale lock based OS is error prone.
â—Ź Application needs to handle most of the scheduling works and needs to
manage the corresponding resources.
10. Solution
Need an Operating system which can be scalable and gives
solution for the challenges in Multicore and Cloud
operating system
FOS
11. Factored Operating System (FOS)
â—Ź It is a single system image operating system across both multicore and
cloud system (IaaS).
â—Ź Scalability and adaptability are the main design constraints.
FOS tackles OS scalability challenges by factoring the OS into component
system services. Here system services further divided in to internet inspired
services which communicate through message passing.
- File System service
- Scheduling
- Memory management
- Access to hardware &
- Fault tolerance
- Demand elasticity
12. Benefits of Single System Image
â—Ź Ease of administration
â—Ź Transparent sharing
â—Ź Informed optimization
â—Ź Consistency
â—Ź Fault tolerance
14. FOS Architecture (contd..)
â—Ź Libfos - Library FOS (Application communicate with servers through this).
â—Ź Hypervisor - A hypervisor or virtual machine monitor is a piece of
computer software, firmware or hardware that creates and runs virtual
machines
â—Ź Microkernel - A small microkernel runs on every core proving messaging
between applications and servers.
â—Ź Proxy network server - It manages the global name mapping
â—Ź Namecache - Cached, a small portion of the global namespace.
15. Why FOS?... How it solves?...
â—Ź OS is factored into function-specific services -Each service is parallel
and distributed. Communicate via messaging (Applications can used
shared memory if supported).
â—Ź Space multiplexing - Belief that there will soon be a time where the
number of cores in the system exceeds the number of active processes.
â—Ź OS adapts resource utilization to changing system needs - OS closely
manages how resources are used. Highly loaded services provisioned
more resources.
â—Ź Faults detected and handled by OS - OS services are monitored by
watchdog process. If a service fails a new instance spawned to meet the
demand.
16. Messaging
â—Ź Simply, focus on the application and communication patterns on a flat
communication medium.
â—Ź Operating system services are strictly implemented using messages for
communication.
â—Ź Messaging done via shared memory or network
â—Ź Intra machine communication used shared memory
â—Ź Sharing of data becomes much more explicit in the programming model.
â—Ź There are mailboxes for each processes, to store the delivered messages
by other process.
17. Naming
â—Ź Processes register a particular name for a mailbox
â—Ź When an application messages a particular service, the nameserver will
provide a member of the fleet that is best suited for handling the request.
â—Ź Currently, nameserver implementation uses the preliminary
implementation (RR or closest server) but planning to incorporate with
ideas like hash tables.
â—Ź Complexity dealing with separate forms of interprocess communication
in traditional cloud systems is abstracted beyond the naming and
messaging api.
18. OS Services
â—Ź Parallelizes each system service into a fleet of spatially distributed,
cooperating servers that easy to scale and dynamically adaptable to
changing demand.
â—Ź There are multiple fleets active in a system. (eg: file system fleet, name
fleet & etc…)
â—Ź To accommodate increased demand new fleet members are added
dynamically and vice versa.
â—Ź OS services are developed based on a cooperative multi-threaded
programming model; easy to use remote procedure call; serialization
facilities and data structures for common patterns of data sharing.
20. Case Study – Spawning Server
Create new server process on – decided by spawn server
21. Case Study – Elastic Fleet
• A watchdog process monitoring the queue length
• Add server to fleet
âž” Spawn, handshaking,
• Make global decisions of elastic fleet
22. Implementation
• Xen para-virtualized machine (PVM) OS
• Run on EC2 or Eucalyptus cloud infrastructure
• Configuration
âž” 16 machine cluster, each has 8 cores running at
3.16 GHz, 8G main memory, 1G Ethernet
26. Conclusion
â—Ź FOS provides scalability, fault tolerance & demand elasticity.
â—Ź FOS is scalable and adaptive, it allows application developer to focus on
application level problem solving without distractions from underlying
system infrastructure.
â—Ź FOS is an highly complex approach which move the complexity from
application level to OS level.