An operating system for multicore and clouds: mechanism and implementation

An Operating System for Multicore and Clouds :
Mechanisms and Implementation
Authors :
David Wentzlaff, Charles Gruenwald, Nathan Beckmann, Kevin Modzelewski, Adam
Belay, Lamia Youseff, Jason Miller & Anant Agarwal
Computer Science and Artificial Intelligence Laboratory, MIT
Presenting By : Mohanadarshan (148241N) & Ireshika (138214C)

Content
● Motivation
● Problem
● Challenges in Muticore and Cloud systems.
● Solution
● Factored Operating Systems
● How FOS solves the problem
● Case Studies
● Results
● Conclusion

Motivation
Source : IEEE Spectrum's special report 2010 & Gartner report 2013

Problem
Multicore and Cloud systems cannot use existing operating systems.
● They cannot scale much.
● It put more responsibility on user to manage the system configurations
and resources.
● They does not take full advantage of the increased computational
capacity.
● Existing OS are designed targeting on machines which has smaller
number of cores.
● Inefficiency in managing faults in large scale systems (with many cores)

Challenges in Multicore & Cloud Systems
● Scalability
● Variability of Demand
● Faults
● Programming Challenges

Scalability
● Existing OSs are designed for single processor or to some limited
processors.
● There are many scalability limitation exists
➔ Limitations in locking
➔ Locality aliasing
➔ Reliance on shared memory.
● Cloud resources are virtually unlimited for a given user.

Variability of Demand
● Os needs to manage the live cores to match the demand. but existing OSs
only manage the single core (active or idle).
● Cloud computing makes more resources available on-demand (since user
expectation can change in run time)
● Demand is not static, it is dynamic.

Faults
● Hardware faults are more common in multicore or cloud computing
system, need to manage them
● System software (OS) must gracefully support dying cores and bit flips.
● Lack of tools to debug and detect faults in multicore or cloud systems.

Programming Challenges
● Uniprocessor OSs are works in multiprocessor system by adding locks to
OS data structure.
➔ Choosing correct lock granularity
➔ deadlock prevention
● Efficient large-scale lock based OS is error prone.
● Application needs to handle most of the scheduling works and needs to
manage the corresponding resources.

Solution
Need an Operating system which can be scalable and gives
solution for the challenges in Multicore and Cloud
operating system
FOS

Factored Operating System (FOS)
● It is a single system image operating system across both multicore and
cloud system (IaaS).
● Scalability and adaptability are the main design constraints.
FOS tackles OS scalability challenges by factoring the OS into component
system services. Here system services further divided in to internet inspired
services which communicate through message passing.
- File System service
- Scheduling
- Memory management
- Access to hardware &
- Fault tolerance
- Demand elasticity

Benefits of Single System Image
● Ease of administration
● Transparent sharing
● Informed optimization
● Consistency
● Fault tolerance

FOS Architecture (contd..)
● Libfos - Library FOS (Application communicate with servers through this).
● Hypervisor - A hypervisor or virtual machine monitor is a piece of
computer software, firmware or hardware that creates and runs virtual
machines
● Microkernel - A small microkernel runs on every core proving messaging
between applications and servers.
● Proxy network server - It manages the global name mapping
● Namecache - Cached, a small portion of the global namespace.

Why FOS?... How it solves?...
● OS is factored into function-specific services -Each service is parallel
and distributed. Communicate via messaging (Applications can used
shared memory if supported).
● Space multiplexing - Belief that there will soon be a time where the
number of cores in the system exceeds the number of active processes.
● OS adapts resource utilization to changing system needs - OS closely
manages how resources are used. Highly loaded services provisioned
more resources.
● Faults detected and handled by OS - OS services are monitored by
watchdog process. If a service fails a new instance spawned to meet the
demand.

Messaging
● Simply, focus on the application and communication patterns on a flat
communication medium.
● Operating system services are strictly implemented using messages for
communication.
● Messaging done via shared memory or network
● Intra machine communication used shared memory
● Sharing of data becomes much more explicit in the programming model.
● There are mailboxes for each processes, to store the delivered messages
by other process.

Naming
● Processes register a particular name for a mailbox
● When an application messages a particular service, the nameserver will
provide a member of the fleet that is best suited for handling the request.
● Currently, nameserver implementation uses the preliminary
implementation (RR or closest server) but planning to incorporate with
ideas like hash tables.
● Complexity dealing with separate forms of interprocess communication
in traditional cloud systems is abstracted beyond the naming and
messaging api.

OS Services
● Parallelizes each system service into a fleet of spatially distributed,
cooperating servers that easy to scale and dynamically adaptable to
changing demand.
● There are multiple fleets active in a system. (eg: file system fleet, name
fleet & etc…)
● To accommodate increased demand new fleet members are added
dynamically and vice versa.
● OS services are developed based on a cooperative multi-threaded
programming model; easy to use remote procedure call; serialization
facilities and data structures for common patterns of data sharing.

Case Study – Spawning Server
Create new server process on – decided by spawn server

Case Study – Elastic Fleet
• A watchdog process monitoring the queue length
• Add server to fleet
➔ Spawn, handshaking,
• Make global decisions of elastic fleet

Implementation
• Xen para-virtualized machine (PVM) OS
• Run on EC2 or Eucalyptus cloud infrastructure
• Configuration
➔ 16 machine cluster, each has 8 cores running at
3.16 GHz, 8G main memory, 1G Ethernet

Result – fos network stack & app

Conclusion
● FOS provides scalability, fault tolerance & demand elasticity.
● FOS is scalable and adaptive, it allows application developer to focus on
application level problem solving without distractions from underlying
system infrastructure.
● FOS is an highly complex approach which move the complexity from
application level to OS level.

Interesting References
● http://software.intel.com/en-us/articles/performance-scaling-in-the-multi-
core-era
● http://spectrum.ieee.org/semiconductors/processors/multicore-cpus-
processor-proliferation
● http://www.rackspace.com/knowledge_center/whitepaper/understanding-
the-cloud-computing-stack-saas-paas-iaas
● http://machinedesign.com/news/processor-future-multicore
● http://groups.csail.mit.edu/carbon/docs/Wentzlaff.2009.OSR.fos.pdf

An operating system for multicore and clouds: mechanism and implementation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to An operating system for multicore and clouds: mechanism and implementation

Similar to An operating system for multicore and clouds: mechanism and implementation (20)

Recently uploaded

Recently uploaded (20)

An operating system for multicore and clouds: mechanism and implementation