CS9222 Advanced Operating
System
Unit – V
Dr.A.Kathirvel
Professor & Head/IT - VCEW
Unit - V
Structures – Design Issues – Threads – Process
Synchronization – Processor Scheduling – Memory
Management – Reliability / Fault Tolerance; Database
Operating Systems – Introduction – Concurrency
Control – Distributed Database Systems –
Concurrency Control Algorithms.
Motivation for Multiprocessors
Enhanced Performance -
Concurrent execution of tasks for increased
throughput (between processes)
Exploit Concurrency in Tasks (Parallelism
within process)
Fault Tolerance -
graceful degradation in face of failures
Basic MP Architectures
Single Instruction Single Data (SISD) -
conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) -
Vector and Array Processors
Multiple Instruction Single Data (MISD) -
Not Implemented.
Multiple Instruction Multiple Data (MIMD)
- conventional MP designs
MIMD Classifications
Tightly Coupled System - all processors
share the same global memory and have
the same address spaces (Typical SMP
system).
Main memory for IPC and Synchronization.
Loosely Coupled System - memory is
partitioned and attached to each processor.
Hypercube, Clusters (Multi-Computer).
Message passing for IPC and synchronization.
MP Block Diagram
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
MM MM MM MM
Interconnection Network
Memory Access Schemes
• Uniform Memory Access (UMA)
– Centrally located
– All processors are equidistant (access times)
• NonUniform Access (NUMA)
– physically partitioned but accessible by all
– processors have the same address space
• NO Remote Memory Access (NORMA)
– physically partitioned, not accessible by all
– processors have own address space
Other Details of MP
Interconnection technology
Bus
Cross-Bar switch
Multistage Interconnect Network
Caching - Cache Coherence Problem!
Write-update
Write-invalidate
bus snooping
MP OS Structure - 1
Separate Supervisor -
all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency
• Master/Slave Configuration
– master monitors the status and assigns work to
other processors (slaves)
– Slaves are a schedulable pool of resources for
the master
– master can be bottleneck
– poor fault tolerance
MP OS Structure - 2
Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently
across all processors
Synchronize access to shared data structures:
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that normally
have little interaction
multithread kernel and control access to resources
(continuum)
MP OS Structure - 3
MP Overview
MultiProcessor
SIMD MIMD
Shared Memory
(tightly coupled)
Distributed Memory
(loosely coupled)
Master/Slave Symmetric
(SMP)
Clusters
SMP OS Design Issues
Threads - effectiveness of parallelism depends
on performance of primitives used to express
and control concurrency.
Process Synchronization - disabling interrupts
is not sufficient.
Process Scheduling - efficient, policy controlled,
task scheduling (process/threads)
global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread
dependencies
Memory Management - complicated since
main memory is shared by possibly many
processors. Each processor must maintain its
own map tables for each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency
Reliability and fault Tolerance - degrade
gracefully in the event of failures
SMP OS design issues - 2
Typical SMP System
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
cache MMU
CPU
I/O
subsystem
Issues:
• Memory contention
• Limited bus BW
• I/O contention
• Cache coherence
Main
Memory
50ns
Typical I/O Bus:
• 33MHz/32bit (132MB/s)
• 66MHz/64bit (528MB/s)
500MHz
System/Memory Bus
ether
scsi
video
Bridge
System Functions
(timer, BIOS, reset)
INT
Some Definitions
Parallelism: degree to which a multiprocessor
application achieves parallel execution
Concurrency: Maximum parallelism an
application can achieve with unlimited
processors
System Concurrency: kernel recognizes multiple
threads of control in a program
User Concurrency: User space threads
(coroutines) provide a natural programming
model for concurrent applications. Concurrency
not supported by system.
Process and Threads
Process: encompasses
set of threads (computational entities)
collection of resources
Thread: Dynamic object representing an
execution path and computational state.
threads have their own computational state: PC,
stack, user registers and private data
Remaining resources are shared amongst threads
in a process
Threads
Effectiveness of parallel computing depends on
the performance of the primitives used to
express and control parallelism
Threads separate the notion of execution from
the Process abstraction
Useful for expressing the intrinsic concurrency
of a program regardless of resulting
performance
Three types: User threads, kernel threads and
Light Weight Processes (LWP)
User Level Threads
User level threads - supported by user level
(thread) library
Benefits:
no modifications required to kernel
flexible and low cost
Drawbacks:
can not block without blocking entire process
no parallelism (not recognized by kernel)
Kernel Level Threads
Kernel level threads - kernel directly supports
multiple threads of control in a process. Thread
is the basic scheduling entity
Benefits:
coordination between scheduling and
synchronization
less overhead than a process
suitable for parallel application
Drawbacks:
more expensive than user-level threads
generality leads to greater overhead
Light Weight Processes (LWP)
Kernel supported user thread
Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP
LWP is scheduled by kernel
User threads scheduled by library onto LWPs
Multiple LWPs per process
Thread operations in user space:
create, destroy, synch, context switch
kernel threads implement a virtual processor
Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals). Example, for
scheduling decisions and preemption warnings.
Kernel scheduler interface - allows dissimilar thread
packages to coordinate.
First Class threads (Psyche OS)
Scheduler Activations
An activation:
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of current
user thread when stopped by kernel
kernel is responsible for processor allocation =>
preemption by kernel.
Thread package responsible for scheduling
threads on available processors (activations)
Support for Threading
• BSD:
– process model only. 4.4 BSD enhancements.
• Solaris:provides
– user threads, kernel threads and LWPs
• Mach: supports
– kernel threads and tasks. Thread libraries provide
semantics of user threads, LWPs and kernel threads.
• Digital UNIX: extends MACH to provide usual
UNIX semantics.
– Pthreads library.
Process Synchronization:Motivation
Sequential execution runs correctly but
concurrent execution (of the same program)
runs incorrectly.
Concurrent access to shared data may result in
data inconsistency
Maintaining data consistency requires
mechanisms to ensure the orderly execution of
cooperating processes
Let’s look at an example: consumer-producer
problem.
Producer-Consumer Problem
Producer
while (true) {
/* produce an item and put in
nextProduced */
while (count == BUFFER_SIZE); // do
nothing
buffer [in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
count++;
}
count: the number of items in the
buffer (initialized to 0)
Consumer
while (true) {
while (count == 0); // do nothing
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
count--;
// consume the item in nextConsumed
}
What can go wrong in concurrent
execution?
Race Condition
 count++ could be implemented as
register1 = count
register1 = register1 + 1
count = register1
 count-- could be implemented as
register2 = count
register2 = register2 - 1
count = register2
 Consider this execution interleaving with “count = 5” initially:
 S0: producer execute register1 = count {register1 = 5}
S1: producer execute register1 = register1 + 1 {register1 = 6}
S2: consumer execute register2 = count {register2 = 5}
S3: consumer execute register2 = register2 - 1 {register2 = 4}
S4: producer execute count = register1 {count = 6 }
S5: consumer execute count = register2 {count = 4}
What are all possible values from concurrent execution?
How to prevent race condition?
 Define a critical section in
each process
 Reading and writing
common variables.
 Make sure that only one
process can execute in the
critical section at a time.
 What sync code to put into
the entry & exit sections to
prevent race condition?
do {
entry section
critical section
exit section
remainder section
} while (TRUE);
Solution to Critical-Section
Problem
1. Mutual Exclusion - If process Pi is executing in its critical section, then no
other processes can be executing in their critical sections
2. Progress - If no process is executing in its critical section and there exist
some processes that wish to enter their critical section, then the
selection of the processes that will enter the critical section next cannot
be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of times that
other processes are allowed to enter their critical sections after a process
has made a request to enter its critical section and before that request is
granted
What is the difference between
Progress and Bounded Waiting?
Peterson’s Solution
Simple 2-process solution
Assume that the LOAD and STORE instructions are
atomic; that is, cannot be interrupted.
The two processes share two variables:
int turn;
Boolean flag[2]
The variable turn indicates whose turn it is to enter
the critical section.
The flag array is used to indicate if a process is ready
to enter the critical section. flag[i] = true implies that
process Pi is ready!
Algorithm for Process Pi
while (true) {
flag[i] = TRUE;
turn = j;
while ( flag[j] && turn == j);
CRITICAL SECTION
flag[i] = FALSE;
REMAINDER SECTION
}
 Mutual exclusion
 Only one process enters critical section
at a time.
 Proof: can both processes pass the while
loop (and enter critical section) at the
same time?
 Progress
 Selection for waiting-to-enter-critical-
section process does not block.
 Proof: can Pi wait at the while loop
forever (after Pj leaves critical section)?
 Bounded Waiting
 Limited time in waiting for other
processes.
 Proof: can Pj win the critical section
twice while Pi waits?
Entry Section
Exit Section
Algorithm for Process Pi
while (true) {
flag[i] = TRUE;
turn = j;
while ( flag[j] && turn == j);
CRITICAL SECTION
flag[i] = FALSE;
REMAINDER SECTION
}
Entry Section
Exit Section
while (true) {
flag[j] = TRUE;
turn = i;
while ( flag[i] && turn == i);
CRITICAL SECTION
flag[j] = FALSE;
REMAINDER SECTION
}
Synchronization Hardware
 Many systems provide hardware support for critical section code
 Uniprocessors – could disable interrupts
 Currently running code would execute without preemption
 Generally too inefficient on multiprocessor systems
Operating systems using this not broadly scalable
 Modern machines provide special atomic hardware instructions
Atomic = non-interruptable
 TestAndSet(target): Either test memory word and set value
 Swap(a,b): Or swap contents of two memory words
TestAndSet Instruction
• Definition:
boolean TestAndSet (boolean *target)
{
boolean rv = *target;
*target = TRUE;
return rv:
}
Solution using TestAndSet
 Shared boolean variable lock, initialized to false.
 Solution:
while (true) {
while ( TestAndSet (&lock ))
; /* do nothing
// critical section
lock = FALSE;
// remainder section
}
 Does it satisfy mutual exclusion?
 How about progress and bounded waiting?
 How to fix this?
Entry Section
Exit Section
Bounded-Waiting TestAndSet
• Shared variable
boolean waiting[n];
boolean lock; // initialized false.
• Solution:
do {
waiting[i] = TRUE;
while (waiting[i] &&
TestAndSet(&lock);
waiting[i] = FALSE;
// critical section
j=(i+1)%n;
while ((j!=i) && !waiting[j])
j=(j+1)%n;
If (j==i) lock = FALSE;
else waiting[j] = FALSE;
// reminder section
} while (TRUE);
 Mutual exclusion
 Proof: can two processes pass the
while loop (and enter critical section)
at the same time?
 Bounded Waiting
 Limited time in waiting for other
processes.
 What is waiting[] for? When does
waiting[i] set to FALSE?
 Proof: how long does Pi’s wait till
waiting[i] becomes FALSE?
 Progress
 Proof: exit section unblocks at least
one process’s waiting[] or set the lock
to FALSE.
Entry Section
Exit Section
Swap Instruction
• Definition:
void Swap (boolean *a, boolean *b)
{
boolean temp = *a;
*a = *b;
*b = temp:
}
Solution using Swap
 Shared Boolean variable lock initialized to FALSE; Each process
has a local Boolean variable key.
 Solution:
while (true) {
key = TRUE;
while ( key == TRUE)
Swap (&lock, &key );
// critical section
lock = FALSE;
// remainder section
}
 Mutual exclusion? Progress and Bounded Waiting?
 Notice a performance problem with Swap & TestAndSet
solutions?
Entry Section
Exit Section
Processor Scheduling
PS: ready tasks are assigned to the processors so
that performance is maximized.
Cooperate and communicate through shared
variables or message passing, PS in multiprocessor
system is difficult problem.
PS is very critical to the performance of
multiprocessor systems because a naïve scheduler
can degrade performance substantially.
Issues in Processor Scheduling
3 major causes of performance degradation are
 Preemption inside spinlock-controlled critical sections.
This situation occurs when a task is preempted inside CS when there are
other tasks spinning the lock to enter the same CS.
 cache corruption
Big chunk of data needed by the previous tasks must be purged from the
cache and new data must be brought into the cache.
Very high miss ratio a processor switched to another task – Cache corrp.
 context switching overheads
Execution of a large no. of instructions to save and store the registers, to
initialize the registers, to switch address space, etc.
Co-Scheduling of the Medusa OS
Co-scheduling –proposed by ousterhout for MOS
for cm*
All runnable tasks of an application are scheduled
on the processor simultaneously.
Context switching between appl. Rather than bet.
Tasks of several different applications.
Pbm: tasks wasting resources in lock-spinning
while they wait for a preempted task to release
the critical section.
Smart Scheduling
Proposed by zahorjan et al. – 2 nice features
It avoids preempting a task when the task is inside its
CS
It avoids the rescheduling of tasks that were busy
waiting at the time of their preemption until the task
that is executing the corresponding CS release it.
Eliminates the resource waste due to a processor
spinning a lock.
To reduce the overhead due to context switching
nor to reduce the performance degradation due to
cache corruption.
Scheduling in the NYU Ultracomputer
Edler et al. and it cobines the the strategies of the
previous 2 scheduling techniques.
Tasks can be formed into groups and scheduled in
any of the following ways:
 task – scheduled or preempted in the normal manner
All task in group are sched. Or preempted
simultaneously.
Tasks in group are never preempted.
Memory Management
The Mach Operating System
Virtual MM of mach OS developed at cm*
Design Issues
Portability
Data sharing
Protection
Efficiency
The Mach Kernel
Basic primitives necessary for building parallel and
distributed applications.
The Mach Kernel
4.3 BSD
emulator
System V
emulator HP/UX
emulator Other
emulator
Microkernel
User process
User space
Kernel space
Software
emulator
layer
The kernel manages five principal
abstractions:
1. Processes.
2. Threads.
3. Memory objects.
4. Ports.
5. Messages.
Process Management in Mach
Process
port
Bootstrap
port
Exception
port
Registered
ports
kernel
process
Thread
Address space
Ports
The process port is used to communicate with the
kernel.
The bootstrap port is used for initialization when a
process starts up.
The exception port is used to report exceptions
caused by the process. Typical exceptions are division
by zero and illegal instruction executed.
The registered ports are normally used to provide a
way for the process to communicate with standard
system servers.
Ports
A process can be runnable or blocked.
If a process is runnable, those threads that are
also runnable can be scheduled and run.
If a process is blocked, its threads may not
run, no matter what state they are in.
Process Management Primitives
Create Create a new process, inheriting certain properties
Terminate Kill a specified process
Suspend Increment suspend counter
Resume Decrement suspend counter. If it is 0, unblock the process
Priority Set the priority for current or future threads
Assign Tell which processor new threads should run on
Info Return information about execution time, memory usage, etc.
Threads Return a list of the process’ threads
Threads
 Mach threads are managed by the kernel. Thread creation and destruction are
done by the kernel.
Fork Create a new thread running the same code as the
parent thread
Exit Terminate the calling thread
Join Suspend the caller until a specified thread exits
Detach Announce that the thread will never be jointed (waited
for)
Yield Give up the CPU voluntarily
Self Return the calling thread’s identity to it
Scheduling algorithm
When a thread blocks, exits, or uses up its quantum,
the CPU it is running on first looks on its local run
queue to see if there are any active threads.
If it is nonzero, run the highest-priority thread,
starting at the queue specified by the hint.
If the local run queue is empty, the same algorithm is
applied to the global run queue. The global queue
must be locked first.
Scheduling
Global run queue for processor set 1 Global run queue for processor set 2
Priority
(high) 0
Low 31
0
31
:Free
Count: 6
Hint: 2
:Busy
Count: 7
Hint: 4
Memory Management in Mach
 Mach has a powerful, elaborate, and highly flexible memory
management system based on paging.
 The code of Mach’s memory management is split into three
parts. The first part is the pmap module, which runs in the
kernel and is concerned with managing the MMU.
 The second part, the machine-independent kernel code, is
concerned with processing page faults, managing address
maps, and replacing pages.
 The third part of the memory management code runs as a
user process called a memory manager. It handles the logical
part of the memory management system, primarily
management of the backing store (disk).
Virtual Memory
The conceptual model of memory that Mach user
processes see is a large, linear virtual address space.
The address space is supported by paging.
A key concept relating to the use of virtual address
space is the memory object. A memory object can be
a page or a set of pages, but it can also be a file or
other, more specialized data structure.
An address space with allocated regions,
mapped objects, and unused addresses
File xyz region
Stack region
Data region
Text region
Unused
Unused
Unused
System calls for virtual address
space manipulation
Allocate Make a region of virtual address space usable
Deallocate Invalidate a region of virtual address space
Map Map a memory object into the virtual address space
Copy Make a copy of a region at another virtual address
Inherit Set the inheritance attribute for a region
Read Read data from another process’ virtual address
space
Write Write data to another process’ virtual address space
Memory Sharing
Process 1 Process 2 Process 3
Mapped
file
Operation of Copy-on-Write
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
RW
RO
7
6
5
4
3
2
1
0
RO
Prototype’s address space
Physical memory
Child’s address space
Operation of Copy-on-Write
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
RW
RO
7
6
5
4
3
2
1
0
R
O
Prototype’s address space
Physical memory
Child’s address space8
Copy of page 7
Advantages of Copy-on-write
1. some pages are read-only, so there is no
need to copy them.
2. other pages may never be referenced, so
they do not have to be copied.
3. still other pages may be writable, but the
child may deallocate them rather than using
them.
Disadvantages of Copy-on-write
1. the administration is more complicated.
2. requires multiple kernel traps, one for each
page that is ultimately written.
3. does not work over a network.
External Memory Managers
 Each memory object that is mapped in a process’ address
space must have an external memory manager that controls
it. Different classes of memory objects are handled by
different memory managers.
 Three ports are needed to do the job.
 The object port, is created by the memory manager and will
later be used by the kernel to inform the memory manager
about page faults and other events relating to the object.
 The control port, is created by the kernel itself so that the
memory manager can respond to these events.
 The name port, is used as a kind of name to identify the
object.
Distributed Shared Memory in Mach
The idea is to have a single, linear, virtual
address space that is shared among processes
running on computers that do not have any
physical shared memory. When a thread
references a page that it does not have, it
causes a page fault. Eventually, the page is
located and shipped to the faulting machine,
where it is installed so that the thread can
continue executing.
Communication in Mach
 The basis of all communication in Mach is a kernel data
structure called a port.
 When a thread in one process wants to communicate with a
thread in another process, the sending thread writes the
message to the port and the receiving thread takes it out.
 Each port is protected to ensure that only authorized
processes can send it and receive from it.
 Ports support unidirectional communication. A port that can
be used to send a request from a client to a server cannot also
be used to send the reply back from the server to the client. A
second port is needed for the reply.
A Mach port
Message queue
Current message count
Maximum messages
Port set this port belongs to
Counts of outstanding capabilities
Capabilities to use for error reporting
Queue of threads blocked on this port
Pointer to the process holding the RECEIVE capability
Index of this port in the receiver’s capability list
Pointer to the kernel object
Miscellaneous items
Message passing via a port
port
Sending
thread
Receiving thread
Kernel
send receive
Capabilities
1
2
3
4
1
2
3
4
Port
X
Port
Y
A B
process
thread
Capability listCapability with
SEND right
Capability
with RECEIVE
right kernel
Primitives for Managing Ports
Allocate Create a port and insert its capability in the capability list
Destroy Destroy a port and remove its capability from the list
Deallocate Remove a capability from the capability list
Extract_right Extract the n-th capability from another process
Insert_right Insert a capability in another process’ capability list
Move_member Move a capability into a capability set
Set_qlimit Set the number of messages a port can hold
Sending and Receiving Messages
 Mach_msg(&hdr, options, send_size, rcv_size, rcv_port, timeout, notify_port);
 The first parameter, hdr, is a pointer to the message to be sent or to the place
where the incoming message is put, or both.
 The second parameter, options, contains a bit specifying that a message is to be
sent, and another one specifying that a message is to be received. Another bit
enables a timeout, given by the timeout parameter. Other bits in options allow a
SEND that cannot complete immediately to return control anyway, with a status
report being sent to notify_port later.
 The send_size and rcv_size parameters tell how large the outgoing message is and
how many bytes are available for storing the incoming message, respectively.
 Rcv_port is used for receiving messages. It is the capability name of the port or
port set being listened to.
The Mach message format
Message size
Capability index for destination port
Capability index for reply port
Message kind
Function code
Descriptor 1
Data field 1
Descriptor 2
Data field 2
Reply rights Dest. rightsComplex/Simple
Header
Message
body
Not examined
by the
kernel
Complex message field descriptor
Data field size
In bits
Data field typeNumber of
in the data field
Bits1 1 1 1 12 8 8
Bit
Byte
Unstructured word
Integer(8,16,32 bits)
Character
32 Booleans
Floating point
String
Capability
0: Out-of-line data present
1: No out-of-line data
0: Short form descriptor
1: Long form descriptor
0: Sender keeps out-of-line data
1: Deallocate out-of-line data from sender
Reliability/Fault Tolerance: the
SEQUOIA System
Sequoia system – a loosely coupled
multiprocessor system.
Attains a high level of fault tolerance by
performing fault detection in hardware and
fault recovery in the OS.
Design Issues
Fault detection and isolation
Fault recovery
Efficiency
The sequoia Architecture
The Sequoia Architecture
Reliability/Fault Tolerance: the
SEQUOIA System
Fault detection
Error detecting codes
Comparison of duplicated operations
Protocol monitoring
Fault Recovery
Recovery from processor failures
Recovery from main memory failures
Recovery from I/O failures
Database Operating Systems
 Database system have
been implemented as
an application on top
of general purpose OS
 Requrements of DBOS
 Transaction
Management
 Support for complex,
persistent data
 Buffer Management
Concurrency Control
 CC is the process of controlling concurrent access to a database to
ensure that the correctness of the database is maintained.
 Database systems
Set of shared data objects that can be accessed by users.
Transactions
A transaction consists of a sequence of R, compute & W s/m
that refer to the data objects of a database.
Conflicts
Transactions conflicts if they access the same data objects.
Transaction processing
A transaction is executed by executing its actions one by one
from the beginning to the end.
A concurrency control model of DBS
3 software modules
Transaction manager (TM)
Supervises the execution of a transaction
Data manager (DM)
Responsible for enforcing concurrency control
Scheduler
Distributed Database System
 A distributed database is a database in which storage devices
are not all attached to a common processing unit such as the
CPU.
 It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of
interconnected computers.
 Unlike parallel systems, in which the processors are tightly
coupled and constitute a single database system, a distributed
database system consists of loosely coupled sites that share
no physical components.
Model of Distributed Database System
Distributed Database System
 Motivations: DDBS offers several advantages over a centralized
database system such as
 Sharing
 Higher system availability (reliability)
 Improved performance
 Easy expandability
 Large databases
 Transaction Processing Model
 Serializability condition in DDBS
 Data replication
 Complications due to Data replication
 Fully Replicated Database Systems
1. Enhanced reliability 2. Improved responsiveness 3. No directory
management 4. Easier load balancing
Concurrency Control Algorithms
It controls the interleaving of conflicting actions of
transactions so that the integrity of a database is
maintained, i.e., their net effect is a serial execution.
Basic synchronization primitives
Locks
A transaction can request, hold or release the lock on a data
object.
 lock a data object in 2 modes: exclusive and shared
Timestamps
Unique number is assigned to a transaction or a data object and is
chosen from a monotonically increasing sequence.
Commonly generated using Lamport’s scheme
Lock based algorithms
Static locking
Two Phase Locking (2PL)
Problems with 2PL: Price for Higher concurrency
2PL in DDBS
Timestamp Based locking
Conflict Resolution
Wait Restart Die Wound
Non-two-phase locking
Timestamp Based Algorithms
Basic timestamp ordering algorithm
Thomas Write Rule (TWR)
Multiversion timestamp ordering algorithm
Conservative timestamp ordering algorithm
Thank U

CS9222 ADVANCED OPERATING SYSTEMS

  • 1.
    CS9222 Advanced Operating System Unit– V Dr.A.Kathirvel Professor & Head/IT - VCEW
  • 2.
    Unit - V Structures– Design Issues – Threads – Process Synchronization – Processor Scheduling – Memory Management – Reliability / Fault Tolerance; Database Operating Systems – Introduction – Concurrency Control – Distributed Database Systems – Concurrency Control Algorithms.
  • 3.
    Motivation for Multiprocessors EnhancedPerformance - Concurrent execution of tasks for increased throughput (between processes) Exploit Concurrency in Tasks (Parallelism within process) Fault Tolerance - graceful degradation in face of failures
  • 4.
    Basic MP Architectures SingleInstruction Single Data (SISD) - conventional uniprocessor designs. Single Instruction Multiple Data (SIMD) - Vector and Array Processors Multiple Instruction Single Data (MISD) - Not Implemented. Multiple Instruction Multiple Data (MIMD) - conventional MP designs
  • 5.
    MIMD Classifications Tightly CoupledSystem - all processors share the same global memory and have the same address spaces (Typical SMP system). Main memory for IPC and Synchronization. Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer). Message passing for IPC and synchronization.
  • 6.
    MP Block Diagram cacheMMU CPU cache MMU CPU cache MMU CPU cache MMU CPU MM MM MM MM Interconnection Network
  • 7.
    Memory Access Schemes •Uniform Memory Access (UMA) – Centrally located – All processors are equidistant (access times) • NonUniform Access (NUMA) – physically partitioned but accessible by all – processors have the same address space • NO Remote Memory Access (NORMA) – physically partitioned, not accessible by all – processors have own address space
  • 8.
    Other Details ofMP Interconnection technology Bus Cross-Bar switch Multistage Interconnect Network Caching - Cache Coherence Problem! Write-update Write-invalidate bus snooping
  • 9.
    MP OS Structure- 1 Separate Supervisor - all processors have their own copy of the kernel. Some share data for interaction dedicated I/O devices and file systems good fault tolerance bad for concurrency
  • 10.
    • Master/Slave Configuration –master monitors the status and assigns work to other processors (slaves) – Slaves are a schedulable pool of resources for the master – master can be bottleneck – poor fault tolerance MP OS Structure - 2
  • 11.
    Symmetric Configuration -Most Flexible. all processors are autonomous, treated equal one copy of the kernel executed concurrently across all processors Synchronize access to shared data structures: Lock entire OS - Floating Master Mitigated by dividing OS into segments that normally have little interaction multithread kernel and control access to resources (continuum) MP OS Structure - 3
  • 12.
    MP Overview MultiProcessor SIMD MIMD SharedMemory (tightly coupled) Distributed Memory (loosely coupled) Master/Slave Symmetric (SMP) Clusters
  • 13.
    SMP OS DesignIssues Threads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency. Process Synchronization - disabling interrupts is not sufficient. Process Scheduling - efficient, policy controlled, task scheduling (process/threads) global versus per CPU scheduling Task affinity for a particular CPU resource accounting and intra-task thread dependencies
  • 14.
    Memory Management -complicated since main memory is shared by possibly many processors. Each processor must maintain its own map tables for each process cache coherence memory access synchronization balancing overhead with increased concurrency Reliability and fault Tolerance - degrade gracefully in the event of failures SMP OS design issues - 2
  • 15.
    Typical SMP System cacheMMU CPU cache MMU CPU cache MMU CPU cache MMU CPU I/O subsystem Issues: • Memory contention • Limited bus BW • I/O contention • Cache coherence Main Memory 50ns Typical I/O Bus: • 33MHz/32bit (132MB/s) • 66MHz/64bit (528MB/s) 500MHz System/Memory Bus ether scsi video Bridge System Functions (timer, BIOS, reset) INT
  • 16.
    Some Definitions Parallelism: degreeto which a multiprocessor application achieves parallel execution Concurrency: Maximum parallelism an application can achieve with unlimited processors System Concurrency: kernel recognizes multiple threads of control in a program User Concurrency: User space threads (coroutines) provide a natural programming model for concurrent applications. Concurrency not supported by system.
  • 17.
    Process and Threads Process:encompasses set of threads (computational entities) collection of resources Thread: Dynamic object representing an execution path and computational state. threads have their own computational state: PC, stack, user registers and private data Remaining resources are shared amongst threads in a process
  • 18.
    Threads Effectiveness of parallelcomputing depends on the performance of the primitives used to express and control parallelism Threads separate the notion of execution from the Process abstraction Useful for expressing the intrinsic concurrency of a program regardless of resulting performance Three types: User threads, kernel threads and Light Weight Processes (LWP)
  • 19.
    User Level Threads Userlevel threads - supported by user level (thread) library Benefits: no modifications required to kernel flexible and low cost Drawbacks: can not block without blocking entire process no parallelism (not recognized by kernel)
  • 20.
    Kernel Level Threads Kernellevel threads - kernel directly supports multiple threads of control in a process. Thread is the basic scheduling entity Benefits: coordination between scheduling and synchronization less overhead than a process suitable for parallel application Drawbacks: more expensive than user-level threads generality leads to greater overhead
  • 21.
    Light Weight Processes(LWP) Kernel supported user thread Each LWP is bound to one kernel thread. a kernel thread may not be bound to an LWP LWP is scheduled by kernel User threads scheduled by library onto LWPs Multiple LWPs per process
  • 22.
    Thread operations inuser space: create, destroy, synch, context switch kernel threads implement a virtual processor Course grain in kernel - preemptive scheduling Communication between kernel and threads library shared data structures. Software interrupts (user upcalls or signals). Example, for scheduling decisions and preemption warnings. Kernel scheduler interface - allows dissimilar thread packages to coordinate. First Class threads (Psyche OS)
  • 23.
    Scheduler Activations An activation: servesas execution context for running thread notifies thread of kernel events (upcall) space for kernel to save processor context of current user thread when stopped by kernel kernel is responsible for processor allocation => preemption by kernel. Thread package responsible for scheduling threads on available processors (activations)
  • 24.
    Support for Threading •BSD: – process model only. 4.4 BSD enhancements. • Solaris:provides – user threads, kernel threads and LWPs • Mach: supports – kernel threads and tasks. Thread libraries provide semantics of user threads, LWPs and kernel threads. • Digital UNIX: extends MACH to provide usual UNIX semantics. – Pthreads library.
  • 25.
    Process Synchronization:Motivation Sequential executionruns correctly but concurrent execution (of the same program) runs incorrectly. Concurrent access to shared data may result in data inconsistency Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes Let’s look at an example: consumer-producer problem.
  • 26.
    Producer-Consumer Problem Producer while (true){ /* produce an item and put in nextProduced */ while (count == BUFFER_SIZE); // do nothing buffer [in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++; } count: the number of items in the buffer (initialized to 0) Consumer while (true) { while (count == 0); // do nothing nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE; count--; // consume the item in nextConsumed } What can go wrong in concurrent execution?
  • 27.
    Race Condition  count++could be implemented as register1 = count register1 = register1 + 1 count = register1  count-- could be implemented as register2 = count register2 = register2 - 1 count = register2  Consider this execution interleaving with “count = 5” initially:  S0: producer execute register1 = count {register1 = 5} S1: producer execute register1 = register1 + 1 {register1 = 6} S2: consumer execute register2 = count {register2 = 5} S3: consumer execute register2 = register2 - 1 {register2 = 4} S4: producer execute count = register1 {count = 6 } S5: consumer execute count = register2 {count = 4} What are all possible values from concurrent execution?
  • 28.
    How to preventrace condition?  Define a critical section in each process  Reading and writing common variables.  Make sure that only one process can execute in the critical section at a time.  What sync code to put into the entry & exit sections to prevent race condition? do { entry section critical section exit section remainder section } while (TRUE);
  • 29.
    Solution to Critical-Section Problem 1.Mutual Exclusion - If process Pi is executing in its critical section, then no other processes can be executing in their critical sections 2. Progress - If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely 3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted What is the difference between Progress and Bounded Waiting?
  • 30.
    Peterson’s Solution Simple 2-processsolution Assume that the LOAD and STORE instructions are atomic; that is, cannot be interrupted. The two processes share two variables: int turn; Boolean flag[2] The variable turn indicates whose turn it is to enter the critical section. The flag array is used to indicate if a process is ready to enter the critical section. flag[i] = true implies that process Pi is ready!
  • 31.
    Algorithm for ProcessPi while (true) { flag[i] = TRUE; turn = j; while ( flag[j] && turn == j); CRITICAL SECTION flag[i] = FALSE; REMAINDER SECTION }  Mutual exclusion  Only one process enters critical section at a time.  Proof: can both processes pass the while loop (and enter critical section) at the same time?  Progress  Selection for waiting-to-enter-critical- section process does not block.  Proof: can Pi wait at the while loop forever (after Pj leaves critical section)?  Bounded Waiting  Limited time in waiting for other processes.  Proof: can Pj win the critical section twice while Pi waits? Entry Section Exit Section
  • 32.
    Algorithm for ProcessPi while (true) { flag[i] = TRUE; turn = j; while ( flag[j] && turn == j); CRITICAL SECTION flag[i] = FALSE; REMAINDER SECTION } Entry Section Exit Section while (true) { flag[j] = TRUE; turn = i; while ( flag[i] && turn == i); CRITICAL SECTION flag[j] = FALSE; REMAINDER SECTION }
  • 33.
    Synchronization Hardware  Manysystems provide hardware support for critical section code  Uniprocessors – could disable interrupts  Currently running code would execute without preemption  Generally too inefficient on multiprocessor systems Operating systems using this not broadly scalable  Modern machines provide special atomic hardware instructions Atomic = non-interruptable  TestAndSet(target): Either test memory word and set value  Swap(a,b): Or swap contents of two memory words
  • 34.
    TestAndSet Instruction • Definition: booleanTestAndSet (boolean *target) { boolean rv = *target; *target = TRUE; return rv: }
  • 35.
    Solution using TestAndSet Shared boolean variable lock, initialized to false.  Solution: while (true) { while ( TestAndSet (&lock )) ; /* do nothing // critical section lock = FALSE; // remainder section }  Does it satisfy mutual exclusion?  How about progress and bounded waiting?  How to fix this? Entry Section Exit Section
  • 36.
    Bounded-Waiting TestAndSet • Sharedvariable boolean waiting[n]; boolean lock; // initialized false. • Solution: do { waiting[i] = TRUE; while (waiting[i] && TestAndSet(&lock); waiting[i] = FALSE; // critical section j=(i+1)%n; while ((j!=i) && !waiting[j]) j=(j+1)%n; If (j==i) lock = FALSE; else waiting[j] = FALSE; // reminder section } while (TRUE);  Mutual exclusion  Proof: can two processes pass the while loop (and enter critical section) at the same time?  Bounded Waiting  Limited time in waiting for other processes.  What is waiting[] for? When does waiting[i] set to FALSE?  Proof: how long does Pi’s wait till waiting[i] becomes FALSE?  Progress  Proof: exit section unblocks at least one process’s waiting[] or set the lock to FALSE. Entry Section Exit Section
  • 37.
    Swap Instruction • Definition: voidSwap (boolean *a, boolean *b) { boolean temp = *a; *a = *b; *b = temp: }
  • 38.
    Solution using Swap Shared Boolean variable lock initialized to FALSE; Each process has a local Boolean variable key.  Solution: while (true) { key = TRUE; while ( key == TRUE) Swap (&lock, &key ); // critical section lock = FALSE; // remainder section }  Mutual exclusion? Progress and Bounded Waiting?  Notice a performance problem with Swap & TestAndSet solutions? Entry Section Exit Section
  • 39.
    Processor Scheduling PS: readytasks are assigned to the processors so that performance is maximized. Cooperate and communicate through shared variables or message passing, PS in multiprocessor system is difficult problem. PS is very critical to the performance of multiprocessor systems because a naïve scheduler can degrade performance substantially.
  • 40.
    Issues in ProcessorScheduling 3 major causes of performance degradation are  Preemption inside spinlock-controlled critical sections. This situation occurs when a task is preempted inside CS when there are other tasks spinning the lock to enter the same CS.  cache corruption Big chunk of data needed by the previous tasks must be purged from the cache and new data must be brought into the cache. Very high miss ratio a processor switched to another task – Cache corrp.  context switching overheads Execution of a large no. of instructions to save and store the registers, to initialize the registers, to switch address space, etc.
  • 41.
    Co-Scheduling of theMedusa OS Co-scheduling –proposed by ousterhout for MOS for cm* All runnable tasks of an application are scheduled on the processor simultaneously. Context switching between appl. Rather than bet. Tasks of several different applications. Pbm: tasks wasting resources in lock-spinning while they wait for a preempted task to release the critical section.
  • 42.
    Smart Scheduling Proposed byzahorjan et al. – 2 nice features It avoids preempting a task when the task is inside its CS It avoids the rescheduling of tasks that were busy waiting at the time of their preemption until the task that is executing the corresponding CS release it. Eliminates the resource waste due to a processor spinning a lock. To reduce the overhead due to context switching nor to reduce the performance degradation due to cache corruption.
  • 43.
    Scheduling in theNYU Ultracomputer Edler et al. and it cobines the the strategies of the previous 2 scheduling techniques. Tasks can be formed into groups and scheduled in any of the following ways:  task – scheduled or preempted in the normal manner All task in group are sched. Or preempted simultaneously. Tasks in group are never preempted.
  • 44.
    Memory Management The MachOperating System Virtual MM of mach OS developed at cm* Design Issues Portability Data sharing Protection Efficiency The Mach Kernel Basic primitives necessary for building parallel and distributed applications.
  • 45.
    The Mach Kernel 4.3BSD emulator System V emulator HP/UX emulator Other emulator Microkernel User process User space Kernel space Software emulator layer
  • 46.
    The kernel managesfive principal abstractions: 1. Processes. 2. Threads. 3. Memory objects. 4. Ports. 5. Messages.
  • 47.
    Process Management inMach Process port Bootstrap port Exception port Registered ports kernel process Thread Address space
  • 48.
    Ports The process portis used to communicate with the kernel. The bootstrap port is used for initialization when a process starts up. The exception port is used to report exceptions caused by the process. Typical exceptions are division by zero and illegal instruction executed. The registered ports are normally used to provide a way for the process to communicate with standard system servers.
  • 49.
    Ports A process canbe runnable or blocked. If a process is runnable, those threads that are also runnable can be scheduled and run. If a process is blocked, its threads may not run, no matter what state they are in.
  • 50.
    Process Management Primitives CreateCreate a new process, inheriting certain properties Terminate Kill a specified process Suspend Increment suspend counter Resume Decrement suspend counter. If it is 0, unblock the process Priority Set the priority for current or future threads Assign Tell which processor new threads should run on Info Return information about execution time, memory usage, etc. Threads Return a list of the process’ threads
  • 51.
    Threads  Mach threadsare managed by the kernel. Thread creation and destruction are done by the kernel. Fork Create a new thread running the same code as the parent thread Exit Terminate the calling thread Join Suspend the caller until a specified thread exits Detach Announce that the thread will never be jointed (waited for) Yield Give up the CPU voluntarily Self Return the calling thread’s identity to it
  • 52.
    Scheduling algorithm When athread blocks, exits, or uses up its quantum, the CPU it is running on first looks on its local run queue to see if there are any active threads. If it is nonzero, run the highest-priority thread, starting at the queue specified by the hint. If the local run queue is empty, the same algorithm is applied to the global run queue. The global queue must be locked first.
  • 53.
    Scheduling Global run queuefor processor set 1 Global run queue for processor set 2 Priority (high) 0 Low 31 0 31 :Free Count: 6 Hint: 2 :Busy Count: 7 Hint: 4
  • 54.
    Memory Management inMach  Mach has a powerful, elaborate, and highly flexible memory management system based on paging.  The code of Mach’s memory management is split into three parts. The first part is the pmap module, which runs in the kernel and is concerned with managing the MMU.  The second part, the machine-independent kernel code, is concerned with processing page faults, managing address maps, and replacing pages.  The third part of the memory management code runs as a user process called a memory manager. It handles the logical part of the memory management system, primarily management of the backing store (disk).
  • 55.
    Virtual Memory The conceptualmodel of memory that Mach user processes see is a large, linear virtual address space. The address space is supported by paging. A key concept relating to the use of virtual address space is the memory object. A memory object can be a page or a set of pages, but it can also be a file or other, more specialized data structure.
  • 56.
    An address spacewith allocated regions, mapped objects, and unused addresses File xyz region Stack region Data region Text region Unused Unused Unused
  • 57.
    System calls forvirtual address space manipulation Allocate Make a region of virtual address space usable Deallocate Invalidate a region of virtual address space Map Map a memory object into the virtual address space Copy Make a copy of a region at another virtual address Inherit Set the inheritance attribute for a region Read Read data from another process’ virtual address space Write Write data to another process’ virtual address space
  • 58.
    Memory Sharing Process 1Process 2 Process 3 Mapped file
  • 59.
  • 60.
    Operation of Copy-on-Write 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 RW RO 7 6 5 4 3 2 1 0 R O Prototype’saddress space Physical memory Child’s address space8 Copy of page 7
  • 61.
    Advantages of Copy-on-write 1.some pages are read-only, so there is no need to copy them. 2. other pages may never be referenced, so they do not have to be copied. 3. still other pages may be writable, but the child may deallocate them rather than using them.
  • 62.
    Disadvantages of Copy-on-write 1.the administration is more complicated. 2. requires multiple kernel traps, one for each page that is ultimately written. 3. does not work over a network.
  • 63.
    External Memory Managers Each memory object that is mapped in a process’ address space must have an external memory manager that controls it. Different classes of memory objects are handled by different memory managers.  Three ports are needed to do the job.  The object port, is created by the memory manager and will later be used by the kernel to inform the memory manager about page faults and other events relating to the object.  The control port, is created by the kernel itself so that the memory manager can respond to these events.  The name port, is used as a kind of name to identify the object.
  • 64.
    Distributed Shared Memoryin Mach The idea is to have a single, linear, virtual address space that is shared among processes running on computers that do not have any physical shared memory. When a thread references a page that it does not have, it causes a page fault. Eventually, the page is located and shipped to the faulting machine, where it is installed so that the thread can continue executing.
  • 65.
    Communication in Mach The basis of all communication in Mach is a kernel data structure called a port.  When a thread in one process wants to communicate with a thread in another process, the sending thread writes the message to the port and the receiving thread takes it out.  Each port is protected to ensure that only authorized processes can send it and receive from it.  Ports support unidirectional communication. A port that can be used to send a request from a client to a server cannot also be used to send the reply back from the server to the client. A second port is needed for the reply.
  • 66.
    A Mach port Messagequeue Current message count Maximum messages Port set this port belongs to Counts of outstanding capabilities Capabilities to use for error reporting Queue of threads blocked on this port Pointer to the process holding the RECEIVE capability Index of this port in the receiver’s capability list Pointer to the kernel object Miscellaneous items
  • 67.
    Message passing viaa port port Sending thread Receiving thread Kernel send receive
  • 68.
  • 69.
    Primitives for ManagingPorts Allocate Create a port and insert its capability in the capability list Destroy Destroy a port and remove its capability from the list Deallocate Remove a capability from the capability list Extract_right Extract the n-th capability from another process Insert_right Insert a capability in another process’ capability list Move_member Move a capability into a capability set Set_qlimit Set the number of messages a port can hold
  • 70.
    Sending and ReceivingMessages  Mach_msg(&hdr, options, send_size, rcv_size, rcv_port, timeout, notify_port);  The first parameter, hdr, is a pointer to the message to be sent or to the place where the incoming message is put, or both.  The second parameter, options, contains a bit specifying that a message is to be sent, and another one specifying that a message is to be received. Another bit enables a timeout, given by the timeout parameter. Other bits in options allow a SEND that cannot complete immediately to return control anyway, with a status report being sent to notify_port later.  The send_size and rcv_size parameters tell how large the outgoing message is and how many bytes are available for storing the incoming message, respectively.  Rcv_port is used for receiving messages. It is the capability name of the port or port set being listened to.
  • 71.
    The Mach messageformat Message size Capability index for destination port Capability index for reply port Message kind Function code Descriptor 1 Data field 1 Descriptor 2 Data field 2 Reply rights Dest. rightsComplex/Simple Header Message body Not examined by the kernel
  • 72.
    Complex message fielddescriptor Data field size In bits Data field typeNumber of in the data field Bits1 1 1 1 12 8 8 Bit Byte Unstructured word Integer(8,16,32 bits) Character 32 Booleans Floating point String Capability 0: Out-of-line data present 1: No out-of-line data 0: Short form descriptor 1: Long form descriptor 0: Sender keeps out-of-line data 1: Deallocate out-of-line data from sender
  • 73.
    Reliability/Fault Tolerance: the SEQUOIASystem Sequoia system – a loosely coupled multiprocessor system. Attains a high level of fault tolerance by performing fault detection in hardware and fault recovery in the OS. Design Issues Fault detection and isolation Fault recovery Efficiency The sequoia Architecture
  • 74.
  • 75.
    Reliability/Fault Tolerance: the SEQUOIASystem Fault detection Error detecting codes Comparison of duplicated operations Protocol monitoring Fault Recovery Recovery from processor failures Recovery from main memory failures Recovery from I/O failures
  • 76.
    Database Operating Systems Database system have been implemented as an application on top of general purpose OS  Requrements of DBOS  Transaction Management  Support for complex, persistent data  Buffer Management
  • 77.
    Concurrency Control  CCis the process of controlling concurrent access to a database to ensure that the correctness of the database is maintained.  Database systems Set of shared data objects that can be accessed by users. Transactions A transaction consists of a sequence of R, compute & W s/m that refer to the data objects of a database. Conflicts Transactions conflicts if they access the same data objects. Transaction processing A transaction is executed by executing its actions one by one from the beginning to the end.
  • 78.
    A concurrency controlmodel of DBS 3 software modules Transaction manager (TM) Supervises the execution of a transaction Data manager (DM) Responsible for enforcing concurrency control Scheduler
  • 79.
    Distributed Database System A distributed database is a database in which storage devices are not all attached to a common processing unit such as the CPU.  It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers.  Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.
  • 80.
    Model of DistributedDatabase System
  • 81.
    Distributed Database System Motivations: DDBS offers several advantages over a centralized database system such as  Sharing  Higher system availability (reliability)  Improved performance  Easy expandability  Large databases  Transaction Processing Model  Serializability condition in DDBS  Data replication  Complications due to Data replication  Fully Replicated Database Systems 1. Enhanced reliability 2. Improved responsiveness 3. No directory management 4. Easier load balancing
  • 82.
    Concurrency Control Algorithms Itcontrols the interleaving of conflicting actions of transactions so that the integrity of a database is maintained, i.e., their net effect is a serial execution. Basic synchronization primitives Locks A transaction can request, hold or release the lock on a data object.  lock a data object in 2 modes: exclusive and shared Timestamps Unique number is assigned to a transaction or a data object and is chosen from a monotonically increasing sequence. Commonly generated using Lamport’s scheme
  • 83.
    Lock based algorithms Staticlocking Two Phase Locking (2PL) Problems with 2PL: Price for Higher concurrency 2PL in DDBS Timestamp Based locking Conflict Resolution Wait Restart Die Wound Non-two-phase locking
  • 84.
    Timestamp Based Algorithms Basictimestamp ordering algorithm Thomas Write Rule (TWR) Multiversion timestamp ordering algorithm Conservative timestamp ordering algorithm
  • 85.