1. History of UNIX:
In order to define UNIX, it helps to look at its history. In 1969, Ken Thompson,
Dennis Ritchie and others started work on what was to become UNIX on a "little-used
PDP-7 in a corner" at AT&T Bell Labs. For ten years, the development of
UNIX proceeded at AT&T in numbered versions. V4 (1974) was re-written in C -- a
major milestone for the operating system's portability among different systems.
V6 (1975) was the first to become available outside Bell Labs -- it became the
basis of the first version of UNIX developed at the University of California
Berkeley.
Bell Labs continued work on UNIX into the 1980s, culminating in the release of
System V (as in "five," not the letter) in 1983 and System V, Release 4
(abbreviated SVR4) in 1989. Meanwhile, programmers at the University of
California hacked mightily on the source code AT&T had released, leading to many
a master thesis. The Berkeley Standard Distribution (BSD) became a second major
variant of "UNIX." It was widely deployed in both university and corporate
computing environments starting with the release of BSD 4.2 in 1984. Some of its
features were incorporated into SVR4.
As the 1990s opened, AT&T's source code licensing had created a flourishing
market for hundreds of UNIX variants by different manufacturers. AT&T sold its
UNIX business to Novell in 1993, and Novell sold it to the Santa Cruz Operation
two years later. In the meantime, the UNIX trademark had been passed to the
X/Open consortium, which eventually merged to form The Open Group.1
While the stewardship of UNIX was passing from entity to entity, several long-running
development efforts started bearing fruit. Traditionally, in order to get a
BSD system working, you needed a source code license from AT&T. But by the
early 1990s, Berkeley hackers had done so much work on BSD that most of the
original AT&T source code was long gone. A succession of programmers, starting
with William and Lynne Jolitz, started work on the Net distribution of BSD, leading
to the release of 386BSD version 0.1 on Bastille Day, 1992. This original "free
source" BSD was spun out into three major distributions, each of which has a
dedicated following: NetBSD, FreeBSD, and OpenBSD, all of which are based on
BSD 4.4.2
BSD wasn't the first attempt at a "free" UNIX. In 1984, programmer Richard
Stallman started work on a free UNIX clone known as GNU (GNU's Not UNIX). By
the early 1990s, the GNU Project had achieved several programming milestones,
including the release of the GNU C library and the Bourne Again SHell (bash). The
2. whole system was basically finished, except for one critical element: a working
kernel.
Enter Linus Torvalds, a student at the University of Helsinki in Finland. Linus
looked at a small UNIX system called Minix and decided he could do better. In the
fall of 1991, he released the source code for a freeware kernel called "Linux" -- a
combination of his first name and Minux, pronounced lynn-nucks.3 By 1994, Linus
and a far-flung team of kernel hackers were able to release version 1.0 of Linux.
Linus and friends had a free kernel; Stallman and friends had the rest of a free
UNIX clone system: People could then put the Linux kernel together with GNU to
make a complete free system. This system is known as "Linux," though Stallman
prefers the appellation "GNU/Linux system." There are several distinct GNU/Linux
distributions: some are available with commercial support from companies like
Red Hat, Caldera Systems, and S.U.S.E.; others, like Debian GNU/Linux, are more
closely aligned with the original free software concept.
The spread of Linux, now up to kernel version 2.2, has been a startling
phenomenon. Linux runs on several different chip architectures and has been
adopted or supported to varying extents by several old-line UNIX vendors like
Hewlett-Packard, Silicon Graphics, and Sun Microsystems, by PC vendors like
Compaq and Dell, and by major software vendors like Oracle and IBM. Perhaps
the most delicious irony has been the response of Microsoft, which acknowledges
the competitive threat of ubiquitous free software but seems unwilling or unable
to respond with open-source software of its own.5
Microsoft has, however, struck blows with Windows NT (Windows 2000). During
the late 1990s, vendor after vendor has abandoned the UNIX server platform in
favor of Windows NT or wavered in their support. Silicon Graphics Inc., for
example, has decided that Intel hardware and NT is the graphics platform of the
future.
The phenomenon of old-line UNIX vendors jumping ship and the concurrent rush
to Linux by vendors large and small brings us back to the question at the top of
this section: What is UNIX? While one can abide by the legal definition as
embodied in the trademark, I believe that this does a major disservice to the
industry. As the base software of the Internet, UNIX technology is one the
significant achievements of 20th century civilization. To restrict it to a narrow
legal or technical definition -- as formulated by some of the vendors now
abandoning it -- is to deny its ongoing relevance and importance, which is most
evident in the amazing popularity and strength of UNIX-like clones such as
GNU/Linux and BSD.
3. Unix process
when you execute a program on your UNIX system, the system creates a special
environment for that program. This environment contains everything needed for
the system to run the program as if no other program were running on the
system.Whenever you issue a command in UNIX, it creates, or starts, a new
process. When you tried out the lscommand to list directory contents, you started
a process. A process, in simple terms, is an instance of a running program.The
operating system tracks processes through a five digit ID number known as the
pid or process ID . Each process in the system has a unique pid.Pids eventually
repeat because all the possible numbers are used up and the next pid rolls or
starts over. At any one time, no two processes with the same pid exist in the
system because it is the pid that UNIX uses to track each process.
A process is an instance of running a program. If, for example,
three people are running the same program simultaneously, there are three
4. processes there, not just one. In fact, we might have more than one
processrunning even with only person executing the program, because the
program can “split into two,” making two processes out of one.
Starting a Process:
When you start a process (run a command), there are two ways you can run it:
Foreground Processes
Background Processes
Types of process:
Foreground Processes:
By default, every process that you start runs in the foreground. It gets its input
from the keyboard and sends its output to the screen.
You can see this happen with the ls command. If I want to list all the files in my
current directory, I can use the following command:
$ls ch*.doc
The process runs in the foreground, the output is directed to my screen, and if the
ls command wants any input (which it does not), it waits for it from the keyboard.
While a program is running in foreground and taking much time, we cannot run
any other commands (start any other processes) because prompt would not be
available until program finishes its processing and comes out.
Background Processes:
A background process runs without being connected to your keyboard. If the
background process requires any keyboard input, it waits.
The advantage of running a process in the background is that you can run other
commands; you do not have to wait until it completes to start another!
The simplest way to start a background process is to add an ampersand ( &) at the
end of the command.
$ls ch*.doc &
Here if the ls command wants any input (which it does not), it goes into a
stop state until I move it into the foreground and give it the data from the
keyboard.
If you press the Enter key now, you see the following:
[1] + Done ls ch*.doc &
$
The first line tells you that the ls command background process finishes
successfully. The second is a prompt for another command.
Listing Running Processes:
5. It is easy to see your own processes by running the ps (process status) command
as follows:
One of the most commonly used flags for ps is the -f ( f for full) option, which
provides more information as shown in the following example:
Here is the description of all the fileds displayed by ps -f command:
There are other options which can be used along with ps command:
Stopping Processes:
Ending a process can be done in several different ways. Often, from a console-based
command, sending a CTRL + C keystroke (the default interrupt
6. character) will exit the command. This works when process is running in
foreground mode.
If a process is running in background mode then first you would need to get its
Job ID using pscommand and after that you can use kill command to kill the
process as follows:
Here kill command would terminate first_one process. If a process ignores a
regular kill command, you can use kill -9 followed by the process ID as follows:
Parent and Child Processes:
Each unix process has two ID numbers assigned to it: Process ID (pid) and Parent
process ID (ppid). Each user process in the system has a parent process.
Most of the commands that you run have the shell as their parent. Check ps -f
example where this command listed both process ID and parent process ID.
Zombie and Orphan Processes:
Normally, when a child process is killed, the parent process is told via a SIGCHLD
signal. Then the parent can do some other task or restart a new child as needed.
However, sometimes the parent process is killed before its child is killed. In this
case, the "parent of all processes," init process, becomes the new PPID
(parent process ID).
Sometime these processes are called orphan process.
When a process is killed, a ps listing may still show the process with a Z state. This
is a zombie, or defunct, process. The process is dead and not being used. These
processes are different from orphan processes. They are the processes that has
completed execution but still has an entry in the process table.
Daemon Processes:
Daemon stands for Disk and Execution Monitor. A daemon is a long-running
background process that answers requests for services. The term originated
with UNIX, but most operating systems use daemons in some form or another. In
UNIX, the names of daemons conventionally end in "d". Some examples
include inetd, httpd, nfsd,sshd, named, and lpd.
7. Daemons are system-related background processes that often run with the
permissions of root and services requests from other processes. A daemon
process has no controlling terminal. It cannot open /dev/tty. If you do a "ps -ef"
and look at the tty field, all daemons will have a ? for the tty.More clearly, a
daemon is just a process that runs in the background, usually waiting for
something to happen that it is capable of working with, like a printer daemon is
waiting for print commands.
If you have a program which needs to do long processing then its worth to
make it a daemon and run it in background.
The top Command:
The top command is a very useful tool for quickly showing processes sorted by
various criteria.
It is an interactive diagnostic tool that updates frequently and shows
information about physical and virtual memory, CPU usage, load averages, and
your busy processes.
Here is simple syntax to run top command and to see the statistics of CPU
utilization by different processes:
A Five-State Process Model
(Review)
The not-running state in the two-state model has now been split into a ready
state and a blocked state
New— just been created
Running— currently being executed
Ready— prepared to execute
Blocked— waiting for some event to occur (for an I/O operation to
complete, or a resource to become available, etc.)
New— just been created
Exit— just been terminated
State transition diagram:
8. Context Switching
Stopping one process and starting another is called a context switch.
When the OS stops a process, it stores the hardware registers (PC,
SP, etc.) and any other state information in that process’ PCB
When OS is ready to execute a waiting process, it loads the hardware
registers (PC, SP, etc.) with the values stored in the new process’ PCB,
and restores any other state information
Performing a context switch is a relatively expensive operation
However, time-sharing systems may do 100–1000 context switches a second.
Unix – Signals and Traps
Signals are software interrupts sent to a program to indicate that an important
event has occurred. The events can vary from user requests to illegal memory
access errors. Some signals, such as the interrupt signal, indicate that a user has
asked the program to do something th at is not in the usual flow of control.
The following are some of the more common signals you might encounter and
want to use in your programs:
List
of Signals:
There is an easy way to list down all the signals supported by your system. Just
issue kill -l command and it would display all the supported signals:
9. The actual list of signals varies between Solaris, HP-UX, and Linux.
Thread
Unit of execution (unit of dispatching) and a collection of resources, with which
the unit of execution is associated, characterize the notion of a process.
A thread is the abstraction of a unit of execution. It is also referred to as a light-weight
process (LWP).As a basic unit of CPU utilization, a thread consists of an
instruction pointer (also referred to as the PC or instruction counter), a CPU
register set and a stack. A thread shares its code and data, as well as system
resources and other OS related information, with its peer group (other threads of
the same process).
Threads: an example
A good example of an application that could make use of threads is a file server
on a local area network (LAN).
10. A ‘‘controller’’ thread accepts file service requests and spawns a ‘‘worker’’ thread
for each request, therefore may handle many requests concurrently. When a
worker thread finishes servicing a request, it is destroyed.
Threads Models
• User-level threads:
Implemented through a threads library in the address space of a process, these
are invisible to the operating system. User-level threads (ULTs)are the interface
for application parallelism.
Benefits:
• no modifications required to kernel
• flexible and low cost
–Drawbacks:
• can not block without blocking entire process
• no parallelism (not recognized by kernel)
Kernel level:
Implemented as system calls; can be scheduled directly by
The OS; independent operation of threads in a single
Process ; more expensive (thread) operations.
Kernel level threads- directly supported by
kernel, thread is the basic scheduling entity
–Examples:
• Windows 95/98/NT/2000, Solaris, Tru64 UNIX, BeOS, Linux
– Benefits:
• coordination between scheduling and synchronization
• suitable for parallel application
–Drawbacks:
• more expensive than user-level threads
11. UNIX Process Model
●Start in Created, go to either:
●Ready to Run, in Memory
●or Ready to Run, Swapped (Out) if thereisn’t room in memory for the new
process
12. ●Ready to Run, in Memory is basically same state as Preempted(dotted line)
nPreemptedmeans process was returning to user mode, but the kernel switched
to another process instead
● When scheduled, go to either:
●User Running (if in user mode)
●or Kernel Running (if in kernel mode)
●Go from U.R. to K.R. via system call
●Go to Asleep in Memory when waiting for some event, to RtRiMwhen it occurs
●Go to Sleep, Swapped if swapped out
Scheduling :
A schedule or a timetable is a basic time management tool consisting of a list of
times at which possible tasks, events, or actions are intended to take place, or a
sequence of events in the chronological order in which such things are intended
to take place. The process of creating a schedule - deciding how to order these
tasks and how to commit resources between the variety of possible tasks - is
called scheduling. Or arrange or plan (an event) to take place at a particular time
is called scheduling .
Scheduling and System Performance:
The scheduler determines when and for how long processes run. Therefore, the
scheduler's behavior strongly affects a system's performance.
13. By default, all user processes are time-sharing processes. A process changes class
only by a priocntl(2) (process scheduler control) ‘system call’.
All real-time process priorities have a higher priority than any time-sharing
process. Time-sharing processes or system processes cannot run while any real-time
process is runnable. A real-time application that occasionally fails to
relinquish control of the CPU can completely lock out other users and essential
kernel housekeeping.
Besides controlling process class and priorities, a real-time application must also
control other factors that affect its performance. The most important factors in
performance are CPU power, amount of primary memory, and I/O throughput.
These factors interact in complex ways. The sar(1) command has options for
reporting on all performance factors.
Process State Transition:
Applications that have strict real-time constraints might need to prevent
processes from being swapped or paged out to secondary memory. A simplified
overview of UNIX process states and the transitions between states is shown in
the following figure.
An active process is normally in one of the five states in the diagram. The arrows
show how the process changes states.
A process is running if the process is assigned to a CPU. A process is
removed from the running state by the scheduler if a process with a higher
priority becomes runnable. A process is also pre-empted if a process of
equal priority is runnable when the original process consumes its entire
time slice.
A process is runnable in memory if the process is in primary memory and
ready to run, but is not assigned to a CPU.
A process is sleeping in memory if the process is in primary memory but is
waiting for a specific event before continuing execution. For example, a
14. process sleeps while waiting for an I/O operation to complete, for a locked
resource to be unlocked, or for a timer to expire. When the event occurs, a
wakeup call is sent to the process. If the reason for its sleep is gone, the
process becomes runnable.
When a process' address space has been written to secondary memory,
and that process is not waiting for a specific event, the process is runnable
and swapped.
If a process is waiting for a specific event and has had its whole address
space written to secondary memory, the process is sleeping and swapped.
If a machine does not have enough primary memory to hold all its active
processes, that machine must page or swap some address space to
secondary memory.
When the system is short of primary memory, the system writes individual
pages of some processes to secondary memory but leaves those processes
runnable. When a running process, accesses those pages, the process
sleeps while the pages are read back into primary memory.
Both paging and swapping cause delay when a process is ready to run again. For
processes that have strict timing requirements, this delay can be unacceptable.
To avoid swapping delays, real-time processes are never swapped, though parts
of such processes can be paged. A program can prevent paging and swapping by
locking its text and data into primary memory.
Process scheduling in Unix
When a process is created, the system assigns a lightweight process (LWP) to the
process. If the process is multithreaded, more LWPs might be assigned to the
process. An LWP is the object that is scheduled by the UNIX system scheduler,
which determines when processes run. The scheduler maintains process priorities
that are based on configuration parameters, process behavior, and user requests.
The scheduler uses these priorities to determine which process runs next.
Two-level scheduling
Low level (CPU) scheduler uses multiple queues to select the next process,
out of the processes in memory, to get a time quantum.
Low-level scheduler keeps queues for each priority
Processes in user mode have positive priorities
Processes in kernel mode have negative priorities (lower is higher)
15. High level (memory) scheduler moves processes from memory to disk and
back, to enable all processes their share of CPU time
Unix priority queues
Unix low-level Scheduling Algorithm:
Pick process from highest (non-empty) priority queue.
Run for 1 quantum (usually 100 ms.), or until it blocks.
i ncrement CPU usage count every clock tick.
Every second, recalculate priorities:
o Divide cpu usage by 2
o New priority = base + cpu_usage + nice
o Base is negative if the process is released from waiting in kernel
mode
Use round robin for each queue (separately)
Blocked processes are removed from queue, but when the blocking event
occurs, are placed in a high priority queue
The negative priorities are meant to release processes quickly from the
kernel
Negative priorities are hardwired in the system, for example, -5 for Disk I/O
is meant to give high priority to a process released from disk I/O
16. Interactive processes get good service, CPU bound processes get whatever
service is left...
The six priority classes are
Time-Sharing Class
System Class
Real-time Class
Interactive Class
Fair-Share Class
Fixed-Priority Class
Time-Sharing Class:
The goal of the time-sharing policy is to provide good response time to interactive
processes and good throughput to CPU-bound processes. The scheduler switches
CPU allocation often enough to provide good response time, but not so often that
the system spends too much time on switching. Time slices are typically a few
hundred milliseconds.
The time-sharing policy changes priorities dynamically and assigns time slices of
different lengths. The scheduler raises the priority of a process that sleeps after
only a little CPU use. For example, a process sleeps when the process starts an I/O
operation such as a terminal read or a disk read. Frequent sleeps are
characteristic of interactive tasks such as editing and running simple shell
commands. The time-sharing policy lowers the priority of a process that uses the
CPU for long periods without sleeping.
The time-sharing policy that is the default gives larger time slices to processes
with lower priorities. A process with a low priority is likely to be CPU-bound.
Other processes get the CPU first, but when a low-priority process finally gets the
CPU, that process gets a larger time slice. If a higher-priority process becomes
runnable during a time slice, however, the higher-priority process pre-empts the
running process.
Global process priorities and user-supplied priorities are in ascending order:
higher priorities run first. The user priority runs from the negative of a
configuration-dependent maximum to the positive of that maximum. A process
inherits its user priority. Zero is the default initial user priority.
The “user priority limit” is the configuration-dependent maximum value of the
user priority. You can set a user priority to any value lower than the user priority
17. limit. With appropriate permission, you can raise the user priority limit. Zero is the
user priority limit by default.
An administrator configures the maximum user priority independent of global
time-sharing priorities. For example, in the default configuration a user can set a
user priority in the –20 to +20 range. However, 60 time-sharing global priorities
are configured.
The scheduler manages time-sharing processes by using configurable parameters
in the time-sharing parameter table ts_dptbl(4) (time-sharing dispatcher
parameter table ‘File Format’). This table contains information specific to the
time-sharing class.
System Class:
The system class uses a fixed-priority policy to run kernel processes such as
servers and housekeeping processes like the paging daemon. The system class is
reserved to the kernel. Users cannot add a process to the system class. Users
cannot remove a process from the system class. Priorities for system class
processes are set up in the kernel code. The priorities of system processes do not
change once established. User processes that run in kernel mode are not in the
system class.
Real-time Class:
The real-time class uses a scheduling policy with fixed priorities so that critical
processes run in predetermined order. Real-time priorities never change except
when a user requests a change. Privileged users can use the priocntl(1) (display or
set scheduling parameters of specified process(es)’ User Command’) command to
assign real-time priorities.
The scheduler manages real-time processes by using configurable parameters in
the real-time parameter table rt_dptbl(4) (real-time dispatcher parameter table
‘File Format’). This table contains information specific to the real-time class.
Interactive Class:
The IA class is very similar to the TS class. When used in conjunction with a
windowing system, processes have a higher priority while running in a window
with the input focus. The IA class is the default class while the system runs a
windowing system. The IA class is otherwise identical to the TS class, and the two
classes share the same ts_dptbl(4) (time sharing dispatch parameter table)
Fair-Share Class:
The FSS class is used by the Fair-Share Scheduler (FSS(7))( Fair share scheduler) to
manage application performance by explicitly allocating shares of CPU resources
to projects. A share indicates a project's entitlement to available CPU resources.
18. The system tracks resource usage over time. The system reduces entitlement
when usage is heavy. The system increases entitlement when usage is light. The
FSS schedules CPU time among processes according to their owners' entitlements,
independent of the number of processes each project owns. The FSS class uses
the same priority range as the TS and IA classes. See the FSS man page for more
details
Fixed-Priority Class:
The FX class provides a fixed-priority pre-emptive scheduling policy. This policy is
used by processes that require user or application control of scheduling priorities
but are not dynamically adjusted by the system. By default, the FX class has the
same priority range as the TS, IA, and FSS classes. The FX class allows user or
application control of scheduling priorities through user priority values assigned
to processes within the class. These user priority values determine the scheduling
priority of a fixed-priority process relative to other processes within its class.
The scheduler manages fixed-priority processes by using configurable parameters
in the fixed-priority dispatch parameter table fx_dptbl(4) (fixed priority dispatcher
parameter table ‘file format’). This table contains information specific to the
fixed-priority class.
Commands and Interfaces:
The following figure illustrates the default process priorities.
A process priority has meaning only in the context of a scheduler class. You
specify a process priority by specifying a class and a class-specific priority value.
19. The class and class-specific value are mapped by the system into a global priority
that the system uses to schedule processes.
The ps(1) command with -cel options reports global priorities for all active
processes. The priocntl(1) command reports the class-specific priorities that users
and programmers use.
The priocntl(1) command and the priocntl(2) and priocntlset(2) interfaces are
used to set or retrieve scheduler parameters for processes. Setting priorities
generally follows the same sequence for the command and both interfaces:
1. Specify the target processes.
2. Specify the scheduler parameters that you want for those processes.
3. Execute the command or interface to set the parameters for the processes.
Process IDs are basic properties of UNIX processes. The class ID is the
scheduler class of the process. priocntl(2) works only for the time-sharing and
the real-time classes, not for the system class.
priocntl Usage:
The priocntl(1) utility performs four different control interfaces on the scheduling
of a process:
priocntl -l
Displays configuration information
priocntl -d
Displays the scheduling parameters of processes
priocntl -s
Sets the scheduling parameters of processes
priocntl -e
Executes a command with the specified scheduling parameters
The following examples demonstrate the use of priocntl(1).
The -l option for the default configuration produces the following output:
$ priocntl -l
CONFIGURED CLASSES
==================
SYS (System Class)
TS (Time Sharing)
Configured TS User Priority Range -60 through 60
20. RT (Real Time)
Maximum Configured RT Priority: 59
To display information on all processes, do the following:
$ priocntl -d -i all
To display information on all time-sharing processes:
$ priocntl -d -i class TS
Kernel Processes:
The kernel's daemon and housekeeping processes are members of the system
scheduler class. Users can neither add processes to nor remove processes from
this class, nor can users change the priorities of these processes. The command ps
-cel lists the scheduler class of all processes. A SYS entry in the CLS column
identifies processes in the system class when you run ps(1) with the -f option.
The Deadlock Problem :
Law passed by the Kansas Legislature in early 20th century:
“When two trains approach each other at a crossing, both shall come to a full stop
and neither shall start upon again until the
other has gone.”
Deadlock or Deadly Embrace :
Permanent blocking of a set of processes that either compete for system
resources or communicate with each other
–Several processes may compete for a finite set of resources
–Processes request resources and if a resource is not available, enter a
wait state
–Requested resources may be held by other waiting processes
–Require divine intervention to get out of this problem
21. A significant problem in real systems, because there is no efficient
solution in the general case
Deadlock problem is more important because of increasing use of
multiprocessing systems (like real-time, life support,
vehicle monitoring, multicore utilization, grid processing)
Deadlocks can occur with
– Serially reusable (SR) resources – printer, tape drive, memory
A finite set of identical units, with the number of units constant
Can be used safely by only one process at a time and are not
depleted by that use
Units acquired by processes, used, and released later for use by
other processes
A process may release a unit only if it has previously acquired it
Examples include processors, memory, devices, files, databases,
and semaphores
–Consumable resources – messages
Resource gets created dynamically and may be destroyed after
use
Typically no limit on the number of consummable resources of a
specific type
Examples are messages, signals, interrupts, and information in I/O
buffers
Examples of Deadlocks in Computer Systems:
Reusable resources
– File Sharing
Consider two processes p1 and p2
They update a file F and require a scratch tape during the updating
Only one tape drive T available
T And F are serially reusable resources, and can be used only by
exclusive access.
p 2 Needs T immediately prior to updating
request operation
Blocks the process requesting the resource
Puts the process on the wait queue
22. The process is to remain blocked until the requested resource
is
Available
If the resource is available, the process is granted exclusive
access
to it.
Release operation
Returns the resource being released to the system
Wakes up the process waiting for the resource
P 1 and P 2 may run as follows
P1: request(F); . . . .P 2: request(T);
r1 : request(T);....r2 : request(F);
.
.
.
.
.
.
release(T); release(F);
release(F) ; release(T)
p1 can block on T Holding F While p2 can block on F holding T
Consumable resources
–Deadlock with messages
A pair of processes p1 And p2
Each process receives a message from the other process and then,
send a message to the other process
p1 () p2 ()
. .
. .
. .
receive (p2) receive (p1)
23. send (p2,m1) send (p1,m1)
Deadlock with blocking receive
Locking in Database Systems
Locking required to preserve the Integrity and consistency of databases,
with random request patterns.
Problem when two records to be updated by two different processes
are locked
Effective Deadlocks
Milder form of indefinite postponement of processes competing for a
resource
Exemplified by Shortest Job Next Scheduling
Deadlocks in Unix
– Possible deadlock condition that cannot be detected
– Number of processes limited by the number of available entries in the
process table
– If process table is full, the Fork system call fails
– Process can wait for a random amount of time before Fork ing again
– Deadlocks due to open files, swap space
– Another cause of deadlock can be due to the inode table becoming full
in the filesystem
– Example:
10 processes creating 12 children each
100 entries in the process table
Each process has already created 9 children
No more space in the process table deadlock
Deadlocks problem characterization:
Deadlock Detection
Process resource graphs
Deadlock Recovery
“Best” ways of recovering from a deadlock
24. Deadlock Prevention
Not allowing a deadlock to happen
A Systems Model:
Finite number of resources in the system to be distributed among a
number of competing processes.
Partition the resources into several classes.
Identical resources assigned to the same class (CPU cycles, memory
space, files, tape drives, printers).
Allocation of any instance of resource from a class will satisfy the
request.
State of the OS– allocation status of various resources, and can be
Changed only by process actions.
Process actions
– Request a resource
– Acquire/use a resource
– Release a resource
Resources acquired and used only through system calls
Deadlock Characterization:
Necessary and sufficient conditions for deadlocks – Four conditions to hold
simultaneously
1) Mutual exclusion
– Only one process may use a resource at a time
– At least one resource must be held in a non-sharable mode
2) Hold and wait
– Existence of a process holding at least one resource and waiting to
acquire additional resources currently held by other processes
3) No preemption
– Resources cannot be preempted by the system
4) Circular wait
– Processes waiting for resources held by other waiting processes
Deadlock Detection:
do not restrict process actions or limit resource access (if resources
are available to satisfy requests)
Periodically detect the circular wait condition using a deadlock
detection algorithm
25. Simulate the most favored execution of each unblocked process
– An unblocked process may acquire all the needed resources
– Run and then release All the acquired resources
– Remain dormant thereafter
– Released resources may wake up some previously blocked process
– continue the above steps as long as possible
Recovery from Deadlock:
Recovery by process termination
– Abort all deadlocked processes
– Back up each deadlocked process to some previously defined checkpoint
and restart all of them
Needs rollback and restart mechanisms built into the system
– terminate deadlocked processes in a systematic way
When enough processes terminated to recover from deadlock,
stop terminations
Perform deadlock detection at each process’ termination
Processes should be terminated based on some criterion/policy
Priority of a process
CPU time used and expected usage before completion
Number of resources needed for completion
Number of processes needed to be terminated
Are the processes interactive or batch?
Minimum cost recovery based on
Cost of destroying a process
Cost of recovery from the next process state
Recovery by resource preemption
Enough resources to be preempted from processes and made
available to deadlocked processes to resolve the dead–lock
Selecting a victim
Rollback
Deadlock Prevention:
Uses a conservative resource allocation policy; undercommits resources
Each process can request and acquire All the needed resources at the same
time
– Works well for processes that perform a single burst of activity
– No preemption necessary
– Grossly inefficient
26. – May delay process initialization
– Processes must identify All future resource requirements in advance
Deny one of the required conditions for a deadlock
Mutual Exclusion
Cannot be done for non-sharable resources (like printers)
Sharable resources (read-only files) do not require mutually
exclusive access cannot be involved in deadlock
Cannot deny mutual exclusion as some resources are
inherently non-sharable
Hold and Wait
Processes can request and acquire all the resources at one
time
Request resources only if the process is holding none
Disadvantages
Low resource utilization – resources may get allocated but
not used for a long time.
Possibility of starvation – on popular resources.
No Preemption
If a process holding resources requests for another resource
that cannot be immediately allocated, all currently held
resources are preempted
Process restarted only when it regains All the resources
Suitable for resources whose state can be easily saved -
CPU registers, memory
Circular Wait
Impose a total ordering on all resource types
Each process requests resources in an increasing order of
enumeration
If several instances of a resource required, a single request
must be issued for all of them
Deadlock Prevention based on Maximum Claims:
Also called Deadlock Avoidance
A priori knowledge of maximum possible claims for each process
Dynamically examine the resource allocation status to ensure that no
circular wait condition can exist
Resource allocation state
27. – Defined by the number of available and allocated resources, and the
maximum demands of the processes
– Safe, if the system can allocate resources to each process (up to its
maximum) in some order and still avoid a deadlock
All unsafe states are not deadlock states
An unsafe state may lead to a deadlock
Deadlock Avoidance:
– Requires a process to declare the maximum instances of each resource type
needed
– Upon request, the system must determine whether the allocation will leave
the system in a safe state
– Number of processes in the system –n
– Number of resource classes –m
UNIX Memory Management
Management is responsible to allocate the portion of memory for new
processes, keep track of which parts of memory are in use, deallocate parts of
memory when they are unused, and manage swapping between main memory
and disk and demand paging when main memory is not enough to hold all the
processes.
Evolution of Memory Management
As in a single-process operating system only one process at a time can be
running, there is just one program sharing the memory, except the operating
system. The operating system may be located at the lower-addressed space of
the memory and the user program at the rest part. Thus, the memory
management is quite simple. That is, there is not too much work to do for the
memory management in the single-process operating system. The memory
management just handles how to load the program into the user memory space
from the disk when a program is typed in by a user and leaves the process
management to accomplish the program execution. When a new program name
is typed in by the user after the first one finishes, the memory management also
loads it into the same space and overwrite the first one.
In multiprocessing operating systems, there are many processes
that represent different programs to execute simultaneously, which must be put
in different areas of the memory. Multiprogramming increases the CPU
utilization, but needs complex schemes to divide and manage the memory space
for several processes in order to avoid the processes’ interfering with each other
28. when executing and make their execution just like single process executing in the
system.
It may be the simplest scheme to divide the physical memory
into several fixed areas with different sizes. When a task arrives, the memory
management should allocate it the smallest area that is large enough to hold it
and mark this area as used. When the task finishes, the management should
deallocate the area and mark it as free for the later tasks. A data structure is
necessary to hold the information for each size-fixed area, including its size,
location and use state. Since a program has a starting address when it is executed
and the initial addresses for various areas are different, the memory management
should also have an address transformation mechanism to handle this issue. The
two biggest disadvantages for this scheme are that the fixed sizes cannot meet
the needs of the number increasing of the tasks brought in the system
simultaneously and the size growing of application programs. The former problem
can be handled by swapping and the latter one via paging. Sometimes some
processes wait for I/O devices in memory without doing anything. Otherwise,
some other processes are ready to run, but there is not enough memory to hold
them. Thus, operating system developers consider if the memory management
can swap the waiting-for-I/O processes out of the memory and put them in the
disk temporarily to save the memory space for other processes that are ready to
run. When the memory is available for the processes swapped out on the disk,
the system checks which processes swapped out are ready to run and swap the
ready processes in memory again. This memory management strategy is called
swapping. The memory is allocated to processes dynamically. As the swapping is
fast enough to let the user not to realize the delay and the
system can handle more processes, the performance of the whole system
becomes better. Since the program is kept in a continuous memory space, the
swapping is just done on a whole process.
When the size of an application program becomes too big to load in the
memory as a whole at a time to execute, the memory paging is needed. Paging
technique can divide the main memory into small portions with the same
size— pages, whose size can be 512 or 1024 bytes. When a long program is
executed, the addresses accessed over any short period of time are within an
area around a locality. That is, only a number of pages of the process are
necessarily loaded in main memory over a short period of time. When some
page of the program is needed and not in memory yet, it is acceptable if that
page is loaded in main memory fast enough when demanded, which is called
29. demand paging. In this situation, bringing in or out of memory is with pages,
rather than a whole process. Demand paging combined with page
replacement and swapping implements the virtual memory management in
4.2BSD and UNIX System V.
At the very beginning of UNIX development in the early 1970s, UNIX
System versions adopted swapping as its memory management strategy. It was
designed to transfer entire processes between primary memory and the disk.
Swapping was quite suitable for the hardware system at that time, which had a
small size memory, such as PDP-11 (whose total physical main memory was about
256 Kbytes). With swapping as the memory management scheme, the size of the
physical memory space restricts the size of processes that can be running in the
system. However, the swapping is easy to implement and its system overhead is
quite small.
Around the second half of 1970s, with the advent of the VAX that had
512-byte pages and a number of gigabytes of virtual address space, the BSD
variants of UNIX first implemented demand paging in memory management.
Demand paging transfers pages instead of a whole process between the main
memory and the disk. To start executing, a process is not necessarily to load in
the memory as a whole, but several pages of its initial segment. During its
execution, when the process references the pages that are not in memory, the
kernel loads them in memory on demand. The demand paging allows a big-sized
process to run in a small-sized physical memory space and more processes to
execute simultaneously in a system than just swapping. Even though demand
paging is more flexible than just swapping, demand paging implementation needs
the swapping technique to replace the pages.
From UNIX System V, UNIX System versions also support demand
paging. Another way used in memory management is segmentation, which
divides the user program into logical segments that are corresponding to the
natural length of programming and have unequal sizes from one segment to
another.
Memory Allocation Algorithms in Swapping
With swapping, the memory is assigned to processes dynamically when a new
process is created or an existent process has to swap in from the disk. The
memory management system must handle it. A common method used to keep
track of memory usage is linked lists. The system can maintain the
allocated and free memory segments with one linked list or two separate lists.
The allocated memory segments hold processes that reside currently in
30. memory, and the free memory segments are empty holes that are in between
two allocated segments. With one list, the segment list can be sorted by
address and each entry in the list can be specified as allocated or empty with a
marker bit. With two separate lists for allocated and free memory segments
respectively, each entry in one of the lists holds the address and size of the
segment, and two pointers. One pointer points to the next segment in the list;
the other points to the last segment in the list. With the lists of memory
segments, the following algorithms can be used to allocate memory for a new
process or an existent process that has to swap in. Having one mixed list, the
algorithms search the only list; with two separate lists, the algorithms scan the
list of free memory segments (or the free list), and after allocated, the chosen
segment is transferred from the free list to the allocated list.
• First-fit algorithm. It scans the list of memory segments from the beginning
until it finds the first free segment that is big enough to hold the process. The
chosen free segment is tried to split into two pieces. One piece is just enough
for the process. If the rest piece is greater than or equal to the minimum size
of free segments, one piece is assigned to the process and the other remains
free. If the rest piece is less than the minimum size of free segments, the
whole segment is assigned to the process without being divided. The list
entries are updated.
• Next-fit algorithm. It works almost the same way as first fit does, except that
at the next time it is called to look for a free segment, it starts scanning from
the place where it stopped last time. Thus, it has to record the place where it
finds the free segment every time. And when the searching reaches the end of
the list, it goes back to the beginning of the list and
continues.
• Best-fit algorithm. It searches the whole list, takes the smallest free segment
that is enough to hold the process, assigns it to the process, and updates the
list entries.
• Quick-fit algorithm. It has different lists of the memory segments and puts
the segments with the same level of size into a list. For example, it may have a
table with several entries, in which the first entry is a pointer to a list of 8
Kbyte segments, which are the segments whose sizes are less than or equal to
8 Kbytes; the second entry is a pointer to a list of 16 Kbyte segments, which
are the segments whose sizes are less than or equal to 16 Kbytes and greater
than 8 Kbytes; and so on. When called, it just searches the list of the segments
31. with the size that is close to the requested one. After allocated, the list entries
should be updated.
First-fit algorithm is simple and fast. When all the processes do not occupy
all the physical memory space, Next-fit algorithm is faster than the first fit
algorithm because first-fit the algorithm does searching always from the
beginning of the list. When the processes fill up the physical memory and some of
them are swapped out on the disk, next-fit algorithm does not necessarily surpass
first-fit algorithm. Best-fit algorithm is slower than first-fit and next-fit algorithms
because it must search the entire list every time. Quick-fit algorithm can find a
required free segment faster than other algorithms.
When de-allocating memory, the memory management system has
to do merge free segments to avoid memory split into a large number of small
fragments. That is, if the neighbors of the newly de-allocated segment are also
free, they are merged into one bigger segment by revising the list entries.
Two separate lists for allocated and free segments can speed up all
the algorithms, but make the algorithms more complicated and merging free
segments when de-allocating memory more costly, especially for quick-fit
algorithm.
The early UNIX System versions adopted the fist-fit algorithm to carry
out allocation of both main memory and swap space.
Virtual memory management
Page Replacement Algorithms in Demand Paging
Several important replacement algorithms. When considering how
good a page replacement algorithm is, we can examine how frequent the
thrashing happens when the algorithm used. The thrashing is the phenomenon
that the page that has been just removed from memory is referenced and has
to be brought in memory again.
• The optimal page replacement algorithm. It is an ideal algorithm, and of no
use in real systems. But traditionally, it can be used as a basis reference for
other realistic algorithms. It removes the optimal page that will be referenced
the last among processes currently in memory. But it is difficult and costly to
look for this page.
• The first-in-first-out (FIFO) page replacement algorithm. It removes the page
that is in memory for the longest time among all the processes currently
residing in memory. The memory management system can use a list to
32. maintain all pages currently in memory, and put the page that arrives the most
recently at the end of the list. When a page fault happens, the first page in the
list, which is the first comer, is removed. The new page is the most recent
comer, so it is put at the end of the list.
• The least recently used (LRU) page replacement algorithm. It assumes that
pages that have not been used for a long time in the past would remain
unused for a long time in the future. Thus, when a page fault occurs, the page
unused for the longest time in the past will be removed. To implement LRU
paging, it is necessary to have a linked list of all pages in memory. When a
page is referenced, it is put at the end of the list. Thus, for a while, the head of
the list is the least recently used page, which will be chosen as the one to be
removed when a page fault happens. The implementation can also be
performed with hardware. One way is to equip each page in memory with a
shift register.
• The clock page replacement algorithm. It puts all the pages in memory in a
circular list and maintains the list in a clock-hand-moving order. Usually, the
virtual memory in a computer system has a status bit associated with each
page, for example, R, which is set when the page is referenced. If a page fault
occurs, the system begins its page scanning along the clock-like list. If its R bit
is zero, the page is chosen to remove, and replaced with the new page. Then
the searching pointer, which works like the clock hand, moves to point to next
page in the list and stops there until the next page fault. If R is one, the system
clears the R bit and moves the searching pointer to the next page in the list.
The pointer motion is repeated until a page with zero R bit is found. Then the
system does the page replacement.
33. Process Swapping in UNIX
As UNIX memory management started from swapping the whole
processes out of or in memory, in this section, we will first discuss how to
swap a process as a whole out of or in memory.
Swapped Content
We know the swapping moves a whole process between the memory
and the swap space on the disk. What does the swapped content of a process
consist of?
In UNIX, since processes can execute in user mode or kernel mode,
typically, the major data associated with a process consist of the instruction
segment, the user data segment, and the system data segment. Except the
private code, the instruction segment may include the shared code, which can
be used by several processes. The user data segment includes user data and
stack. The system data segment is composed of kernel data and stack. Either
data or stack in both user and system segments can grow during the process
executing.
The sharable code is not necessary to swap because it is only to read
and there is no need to read in a piece of shared code for each process if the
kernel has already brought it in memory for some process. On the other hand,
multiple processes using the same code can save the memory space. InUNIX,
shared code segments are treated with an extra mechanism.
Except the shared code, all the other segments, including the private
code of the instruction segment, the user data segment, and the system data
34. segment, can be swapped if necessary. To swap easily and fast, all the
segments have to keep in a contiguous area of memory. Contiguous placement
of a process can cause serious external fragmentation of memory. However, in
demand paging, it is not necessary to put a process in a contiguous area of
memory.
Timing of Swapping
In UNIX kernel, the swapper is responsible for swapping processes
between the memory and the swap area on the disk. The swapper is awaked
at least once in a set slice of time (for example 4 seconds) to check whether or
not there are processes to be swapped in or out. The swapper does examine
the process control table to search a process that has been swapped out and
ready to run. If there is free memory space available, the kernel allocates main
memory space for the process, copies its segments into memory, and changes
its state from ready swapped into ready in memory, and puts it in the proper
priority queue to compete CPU with other processes that are also ready in
memory.
If the kernel finds the system does not have enough memory to
make the process to be swapped in, the swapper will examine the process
control table to find a process that sleeps in memory waiting for some event
happens and put it in the swap space on the disk. Then the swapper is back to
search a process to swap in. The free memory space is allocated to that
process. Except when there is not enough room in memory for all the existing
processes, swapping-out can also happen if one of two cases occurs: one is
some segments of a process increase and its old holder cannot accommodate
the process; the other is a parent process creating a child process with a fork
system call. As known, both the user and system data segments may grow and
exceed the original scope during the process execution. If there is enough
memory to allocate a new memory space for the process, the allocation will be
done directly by the invocation of the brk system call that can set the highest
address of a process’s data segment and its old holder will be freed by the
kernel.
If not, the kernel does an expansion swap of the process, which
includes: to allocate the swap space on the disk for the new size of the
process, to modify the address mapping in the process control table according
to the new size, to swap the process out on the swap space, to initiate the
newly expanded space on the disk, and to modify its state as “ready,
35. swapped”. When the swapper is invoked again, the process will be swapped in
memory and finally resume its execution in a new larger memory space.
When a child process is created via the fork system call, the kernel
should allocate main memory space for it (see Figure 4.4). However, if there is
no room in memory available, the kernel will swap out the child process onto
the swap space without freeing the memory because the memory has not
been allocated yet and set the child in “ready, swapped”. Later on, the
swapper will swap it in memory.
Allocation Algorithm
As mentioned before, the first-fit algorithm was adopted in UNIX to allocate
memory or the swap space on the disk to processes.
In UNIX, the swap space on the disk is allocated to a process with
sequential blocks. To do so is for several reasons: first, the use of the swap
space on the disk is temporary; second, the speed of transferring processes
between the memory and the swap space on the disk is crucial; third, for I/O
operations, several contiguous blocks of data transfer are faster than several
separate blocks.
The kernel has a mapping array to manage the swap space. Each
entry of the mapping array holds the address and number of blocks of a free
space. At first, the array has only one entry that consists of the total number of
blocks that the swap space in the system can have. After several times of
swapping in and out, the mapping array can have many entries. The malloc
system call is used to allocate the swap space to the process to be swapped
out. With the first-fit algorithm, the kernel scans the mapping array for the
first entry that can make the process fit in. If the size of the process can cover
all the blocks of entry, all the blocks are allocated to the process, and this entry
is removed from the array. If the process cannot use all the blocks, the kernel
breaks up the blocks into two sequential groups, one that are enough for the
process are allocated to the process; the other becomes a new entry with
modified address and number of blocks in the mapping array.
To free the swap space, the kernel does some merge just like what it
does when deallocating memory. If one or both of the front and back
neighbors are free, the newly freed entry is merged with them. The address or
the number of blocks of the new entry should be modified, and some entry
may be deleted according to the situation. If the newly freed entry is separate,
the kernel adds one entry into an appropriate position of the mapping array
and fills in its address and number of blocks.
36. Selection Principle of Swapped Processes
The selection principles for the processes to be swapped out or in are slightly
different.To swap in processes, the swapper has to make a decision on which
one to be swapped in earlier than others. Two rules are used.
• It examines processes that have been swapped out and are ready to run.
• It tests how long a process stays on the disk. The longer time a process stays
than others, the earlier it will be swapped in.
To swap out processes, the swapper chooses a process according to the rules:
• It examines the processes that are sleeping in memory for some event to
Occur.
• It checks how long a process stays in memory. The process for the longest
time will be first swapped out.
Swapper
The swapper or Process 0 is a kernel process that enters an infinite
loop after the system is booted. It tries to swap processes in memory from the
swap space on the disk or swap out processes from memory onto the swap
space if necessary, or it goes to sleep if there is no process suitable or
necessary to swap. The kernel periodically schedules it like other processes in
the system. When the swapper is scheduled, it examines all processes whose
states are “ready, swapped”, chooses one that has been out on the disk for the
longest time. If there is free memory space available and enough for the
chosen process, the swapper does the swapping in for the process. If
successful, the swapping-in repetition continues to look for other process in
“ready, swapped” state and swap them in one by one until no process on the
swap space is in “ready, swapped” or there is no room in memory available for
the process to be swapped in. If there is no room in memory for the process to
be swapped in, the swapper enters in swapping-out searching, in which it
checks the processes that are sleeping in memory, and chooses the one that
has been in memory for the longest time to swap out on the swap space of the
disk. Then the infinite loop goes back to look for the processes to be swapped
in. If there is no chosen process, the swapper goes to sleep.
37. The procedure of swapper also indicates that the swapping has to
handle three parts: the swap device allocation, swapping processes in
memory, and swapping out of memory.
Swapping Effect
As known, the swapping is simple and appropriate for the system with the
small main memory space. But it has some flaws. As the swap space is on the
disk, the swapping can seriously intensify the file system traffic and increase
the usage of disk. When the kernel wants to swap in a process but there is no
38. free memory space for it, it has to swap out a process. If the size of the process
to be swapped out is much less than the process to be swapped in, one time of
swapping out cannot make the swapping-in successful. This may cause
swapping to delay longer, especially considered that it involves I/O operations.
Furthermore, there is an extreme case existing for swapping. If
some of the swapped-out processes are ready to run, the swapper is invoked
to try to swap them in. But if there is not enough room in memory now, the
swapper has to swap out some process in memory. However, if there is no
room in the swap space on the disk, either, and at the same time some new
process is created, a stalemate can happen. If the processes in memory are
time-consuming, the deadlock can keep much longer.
Demand Paging in UNIX
Since the advent of computer systems with the virtual memory,
such as VAX, in the early 1970s, operating systems have been developed to
manage the memory in a new way related to virtual memory address space
that is different from the physical memory space, and even better, expands
the memory space into a virtual larger scope and allows more processes to
execute in the system simultaneously. Demand paging in virtual memory even
enhances the system throughput and makes the concurrently execution of
multiple processes in a uni-processor system implement well.
As known, the locality principle was first addressed by Peter J. Denning
in 1968 (Denning 1983), which uncovered the fact that the working set of
pages that the process references in a short period of time are limited to a few
of pages. Because the working set is a little dynamical part of a process, if the
memory management handles this dynamicity as soon as possible, the system
potentially can increase its throughput and allow more processes concurrently
executing. We also know that the swapping needs to transfer the whole
processes and may aggravate the disk I/O traffic. Demand paging transfers
only some pages of the processes between the memory and the space on the
disk, and has some mechanisms to reduce the I/O traffic that will be discussed
in this section.
Demand paging usually cooperates with page replacement in the virtual
memory management. The former tackles how and when to bring one or more
pages of a process in memory; the latter handles how to swap out some pages
of a process periodically in order to allow new pages of a process to enter the
memory. When a process references a page that is not in memory, it causes a
page fault that invokes demand paging. The kernel puts the process in sleeping
39. until the needed page is read in memory and is accessible for the process.
When the page is loaded in memory, the process resumes the execution
interrupted by the page fault. As there is always some new page to be brought
in memory, the page replacement must dynamically swap out some pages
onto the swap space.
Demand Paging
As known, in the virtual memory address space, the pages of a process are
indexed by the logical page number, which indicates the logical order in the
process. We also have a physical memory address space in the system, which
is also divided into pages. To avoid making readers confused by two context
related pages, the pages in the virtual memory space are still called pages, but
the pages in physical memory are usually called frames. When a page of a
process is put in a frame, this page is really allocated in physical memory.
Three data structures in the UNIX kernel are needed to support demand
paging: page table, frame table, and swap table. And two handlers are used to
accomplish demand paging for different situations.
Page Table
Entries of the page table are indexed by the page number of a process,
and one entry is for a page. Each entry of the page table has several fields: the
physical address of the page, protection bits, valid bit, reference bit, modify
bits, copy-on-write bit, age bits, the address on the disk, and disk type
Here gives the explanation of the fields:
40. • Physical address is the address of the page in the physical memory, which is
the frame address that the page occupies.
• Protection bits are the access privileges for processes to read, write or
execute the page.
• Valid bit indicates whether or not the content of a page is valid.
• Reference bits indicate how many processes reference the page.
• Modify bit shows whether or not the page is recently modified by processes.
• Copy-on-write bit is used by fork system call when a child process is created.
• Age bits show how long the page is in memory and are used for page
replacement.
• Disk address shows the address of the page on the disk, including the logical
device number and block number, no matter whether it is in the file system or
the swap space on the disk.
• Disk type includes four kinds: file, swap, demand zero, and demand fill. If the
page is in an executable file, its disk type is marked as file and its disk address
is the logical device number and block number of the page in the file on the
file system. If the page is on the swap space, its disk type is marked as swap
and its disk address is the logical device number and block number of the page
in the swap space on the disk. If the page is marked as “demand zero”, which
means bss segment (block started by symbol segment) that contains data not
initialized at compile time, the kernel will clear the page when it assigns the
page to the process. If the page is marked as “demand fill”, which contains the
data initialized at compile time, the kernel will leave the page to be
overwritten with the content of the process when allocating the frame to the
process.
Frame Table
Frame table is used for physical memory management. One frame
of physical memory has an entry in the frame table, and the frame table is
indexed with the frame number in physical memory. Entries in frame table can
be arranged on one of two lists: a free frame list or a hash frame queue.
The frames on the free frame list are reclaimable. When a frame is put
at the end of the free frame list, it will be allocated to a new page of a process
if no process does reference it again in a period of time. However, a process
may cause a page fault that is found still on the free frame list, and it can save
once I/O operation of reading from the swap space on the disk.
The hash frame table is indexed with the key that is the disk address (including
the logical device number and block number). One entry of the hash frame
41. table is corresponding to a hash queue with one unique key value. With the
key value, the kernel can search for a page on the hash frame table
quickly. When the kernel allocates a free frame to a page of a process, it
removes an entry at the front of the free frame list, modifies its disk address,
and inserts the frame into a hash frame queue according to the disk address.
To support the frame allocation and deallocation, each entry in the frame
table has several fields:
• Frame state can be several situations, for example, reclaimable, on the swap
space, in an executable file, being underway of reading in memory, or
accessible.
• The number of referencing processes shows how many processes access the
page.
• Disk address where the page is stored in the file system or the swap space on
the disk includes the logical device number and block number.
• Pointers to the forward and backward neighbor frames on the free frame list
or a hash frame queue.
Swap Table
The swap table is used by page replacement and swapping. Each page on the
swap space has an entry in the swap table. The entry holds a reference field
that indicates how many page table entries point to the page.
Page Fault
We have known that when a process references a page that is not in
memory, it causes a page fault that invokes demand paging. In fact, demand
paging is a handler that is similar to general interrupt handlers, except that the
demand paging handler can go to sleep but interrupt handlers cannot. Because
the demand paging handler is invoked in a running process and it will be back to
the running process, its execution has to be in the context of the running process.
Thus, demand paging handler can sleep when I/O operation is done for the page
read or swapped in memory.
Since page faults can occur in different situations during a process
execution, in the UNIX virtual memory management, there are two kinds of page
faults: protection page faults and validity page faults. Thus, there are two demand
paging handlers, protection handler and validity handler, which handle protection
page faults and validity page faults, respectively. The protection page faults are
often caused by the fork system call. The validity page faults can be resulted from
several situations depending on the different stages of the execution of a process,
42. and mostly related to the execve system call. Thus, later we will discuss these
two, respectively.
Protection page fault
In the UNIX System V, the kernel manages processes with the per process region
table that is usually part of process control block of the process. Each of its entries
represents a region in the process and holds a pointer to the starting virtual
address of the region. The region contains the page table of this region and the
reference field that indicates how many processes reference the region. The per
process region table consists of shared regions and private regions of the process.
The former holds one part of the process that can be shared by several processes;
the latter contains the other part that is protected from other processes’
references.
When the demand paging handler is invoked during the fork system call,
the kernel increments the region reference field of shared regions for the child
process. For each of private regions of the child process, the kernel allocates a
new region table entry and page table. The kernel then examines each entry in
page table of the parent process. If a page is valid, the kernel increments the
reference process number in its frame table entry, indicating the number of
processes that share the page via different regions rather than through the
shared region in order to let the parent and child processes go in different ways
after the execve system call. Similarly, if the page exists on the swap space, it
increments the reference field of the swap table entry for this page. Now the
page can be referenced through both regions, which share the page until one of
the parent or child processes writes to it.
Then the kernel copies the page so that each region has a private version.
To do this, the kernel turns on the copy-on-write bit for each page table entry in
private regions of the parent and child processes during the fork system call. If
either process writes the page, it causes a protection page fault that invokes the
protection handler. Now we can see that the copy-on-write bit in a page table
entry is designed to separate a child process creation from its physical memory
allocation. In this way, via protection page fault, the memory allocation can
postpone until it is needed.
The protection page fault can be caused in two situations. One is when a
process references a valid page but its permission bits do not allow the process
access, and the other is when a process tries to write a page whose copy-on-write
bit is set by the fork system call. The kernel has to check first whether or not
permission is denied in order to make a decision about what to do next, to signal
43. an error message or to invoke the protection handler. If the latter, the protection
handler is invoked.
When the protection handler is invoked, the kernel searches for the
appropriate region and page table entry, and locks the region so that the page
cannot be swapped out while the protection handler operates on it. If the page is
shared with other processes, the kernel allocates a new frame and copies the
contents of the old page to it; the other processes still reference the old page.
After copying the page and updating the page table entry with the new frame
number, the kernel decrements the process reference number of the old frame
table entry.
If the copy-on-write bit of the page is set but the page is not shared with
other processes, the kernel lets the process retain the old frame. Then the kernel
separates the page from its disk copy because the process will write the page in
memory but other processes may use the disk copy. Then it decrements the
reference field of the swap table entry for the page and if the reference number
becomes 0, frees the swap space. It clears the copy-onwrite bit and updates the
page table entry. Then it recalculates the process priority because the process has
been raised to a kernel-level priority when it invokes the demand paging handler
in order to smooth the demand paging process. Finally, before returning to the
user mode, it checks signal receipts that reached during handling the demand
paging. Through the processing above, we can see that the page copying of the
child process is deferred until the process needs it and causes a protection page
fault, rather than when it is created.
BSD systems used demand paging before System V and had their
solution to the separate memory allocation for a child process. In BSD, there are
two versions of fork system calls: one is the regular one that is just the fork
system call; the other is the vfork system call that does not do physical memory
allocation for the child process. The fork system call makes a physical copy of the
pages of the parent process, which is a wasteful operation if it is closely followed
by an execve system call. However the vfork system call, which assumes that a
child process will immediately invoke the execve system call after returning from
the vfork call, does not copy page tables so it is faster than the fork system call of
System V. The potential risk of vfork is that if a programmer uses vfork incorrectly,
the system will go into danger. After vfork system call, the child process uses the
physical memory address space of the parent process before execve or exit is
called, and can ruin the parent’s data and stack by accident and make the parent
not to be able to go back into its working context.
44. Page Replacement
As known, demand paging should cooperate with page replacement to
implement the virtual memory management. When a process executes, its
working pages change dynamically. Some of its pages in memory should be
swapped out dynamically and replaced by new pages to let the process
keep executing until its work finishes. Page replacement is similar to the
swapping, except that it swaps out pages of a process rather than a whole
process. In UNIX, there are two solutions to page replacement: one is the
page stealer of System V; the other is the page daemon of the 4.2BSD.
UNIX security
UNIX systems are designed to encourage user interaction, which can make
them
more difficult to secure.UNIX systems are intended to be open systems; their
specifications and source code are widely available.
The UNIX password file is encrypted. When a user enters a password, it
is encrypted and compared to the encrypted password file.
Thus, passwords are unrecoverable even by the system administrator. UNIX
systems use salting when
encrypting passwords.139 The salting is a two-character string randomly
selected via a function of the time and the process ID. Twelve bits of the
salting then modify the encryption algorithm. Thus, users who choose the
same password (by coincidence or intentionally) will have different encrypted
passwords (with high likelihood).Some installations modify the password
program to prevent users from choosing weak passwords.
The password file must be readable by any user because it contains
other crucial information (i.e., usernames, user IDs, and the like) that is
required by many UNIX tools. For example, because directories employ user
IDs to record file ownership, Is (the tool that lists directory contents and file
ownership) needs to read the password file to determine usernames from user
IDs. If crackers obtain the password file, they could potentially break the
password encryption. To address this issue, UNIX protects the password file
from crackers by storing information other than the encrypted passwords in
the normal password file and storing the encrypted passwords in a shadow
password file that can be accessed only by users with root privileges.
With the UNIX setuid permission feature, a program may be executed by
one user using the privileges of another user. This powerful feature has security
45. flaws, particularly when the resulting privilege is that of the "super user" (who has
access to all files in a UNIX system). For example, if a regular user is able to
execute a shell belonging to the super user, and for which the setuid bit has been
set, then the regular user essentially becomes the superuser.143 clearly, setuid
should be employed carefully. Users, including those with super user privileges,
should periodically examine their directories to confirm the presence of setuid
files and detect any that should not be setuid.
A relatively simple means of compromising security in UNIX systems
(and other operating systems) is to install a program that prints out the login
prompt, copies what the user then types, fakes an invalid login and lets the user
try again.The user has unwittingly given away his or her password! One defense is
that if you are confident you typed the password correctly the first time, you
should log into a different terminal and choose a new password immediately.
UNIX systems include the crypt command, which allows a user to
enter a key and plaintext; ciphertext is output. The transformation can be
reversed trivially with the same key. One problem with this is that users tend to
use the same key repeatedly; once the key is discovered, all other files encrypted
with this key can be read. Users sometimes forget to delete their plaintext files
after producing encrypted versions.This makes discovering the key much easier.
Often, too many people are given superuser privileges. Restricting
super user privileges can reduce the risk of attackers gaining control of a system
due to errors made by inexperienced users. UNIX systems provide a substitute
user identity (su) command to enable users to execute shells with a different
user's credentials. All su activity should be logged; this command lets any user
who types a correct password of another user assume that user's identity,possibly
even acquiring superuser privileges.
A popular Trojan horse technique is to install a fake su program,
which obtains the user's password, e-mails it to the attacker and restores the
regular su program. Never allow others to have write permission for your files,
especially for your directories; if you do, you're inviting someone to install a
Trojan horse.
UNIX systems contain a feature called password aging, in which the
administrator determines how long passwords are valid; when a password
expires, the user receives a message and is asked to enter a new password. There
are several problems with this feature
1. Users often supply easy-to-crack passwords.
46. 2. The system often prevents a user from resetting to the old (or
any other)
Password for a week, so the user can not strengthen a weak password.
3. Users often switch between only two passwords.
Passwords should be changed frequently. A user can keep track of all
login dates and times to determine whether an unauthorized user has logged
in (which means that his or her password has been learned). Logs of
unsuccessful login attempts often store passwords, because users sometimes
accidentally type their password when they mean to type their username.
Some systems disable accounts after a small number of unsuccessful
login attempts. This is a defense against the intruder who tries all possible
passwords. An intruder who has penetrated the system can use this feature to
disable the account or accounts of users, including the system administrator,
who might attempt to detect the intrusion.
The attacker who temporarily gains superuser privileges can install a
trap-door program with undocumented features. For example, someone with
access to source code could rewrite the login program to accept a particular
login name and grant this user super user privileges without even typing a
password.
It is possible for individual users to "grab" the system, thus preventing
other users from gaining access. A user could accomplish this by spawning
thousands of processes, each of which opens hundreds of files, thus filling all
the slots in the open-file table. Installations can guard against this by setting
reasonable limits on the number of processes a parent can spawn and the
number of files that a process can open at once, but this in turn could hinder
legitimate users who need the additional resources.
Kernal of Unix
Overview of Operating Systems and Kernels
Because of the ever-growing feature set and ill design of some modern commercial
operating systems, the notion of what precisely defines an operating system is
vague. Many users consider whatever they see on the screen to be the operating
system. Technically speaking, and in this book, the operating system is considered
the parts of the system responsible for basic use and administration. This includes
the kernel and device drivers, boot loader command shell or other user interface,
and basic file and system utilities. It is the stuff you need not a web browser or
music players. The term system in turn refers to the operating system and all the
applications running on top of it.
47. Of course the topic of this book is the kernel. Whereas the user interface is the
outermost portion of the operating system the kernel is the innermost. It is the core
internals the software that provides basic services for all other parts of the system
manages hardware and distributes system resources. The kernel is sometimes
referred to as the supervisor core or internals of the operating system. Typical
components of a kernel are interrupt handlers to service interrupt requests a
scheduler to share processor time among multiple processes a memory
management system to manage process address spaces and system services such as
networking and inter-process communication. On modern systems with protected
memory management units the kernel typically resides in an elevated system state
compared to normal user applications. This includes a protected memory space and
full access to the hardware. This system state and memory space is collectively
referred to as kernel-space. Conversely, user applications execute in user-space.
They see a subset of the machine's available resources and are unable to perform
certain system functions, directly access hardware, or otherwise misbehave
(without consequences, such as their death, anyhow). When executing the kernel,
the system is in kernel-space executing in kernel mode, as opposed to normal user
execution in user-space executing in user mode. Applications running on the
system communicate with the kernel via system calls .An application typically
calls functions in a library for example the C library that in turn rely on the system
call interface to instruct the kernel to carry out tasks on their behalf. Some library
calls provide many features not found in the system call and thus calling into the
kernel is just one step in an otherwise large function. For example, consider the
familiar printf () function. It provides formatting and buffering of the data and only
eventually calls write () to write the data to the console. Conversely, some library
calls have a one-to-one relationship with the kernel. For example, the open
() library function does nothing except call the open () system call. Still other C
library functions, such as strcpy (), should (you hope) make no use of the kernel at
all. When an application executes a system call, it is said that the kernel is
executing on behalf of the application. Furthermore, the application is said to
be executing a system call in kernel-space, and the kernel is running in process
context. This relationship that applications call into the kernel via the system call
interface is the fundamental manner in which applications get work done.
48. Figure 1.1 Relationship between applications, the kernel, and hardware
The kernel also manages the system's hardware. Nearly all architectures, including
all systems that Linux or UNIX supports, provide the concept of interrupts. When
hardware wants to communicate with the system, it issues an interrupt that
asynchronously interrupts the kernel. Interrupts are identified by a number. The
kernel uses the number to execute a specific interrupt handler to process and
respond to the interrupt. For example, as you type, the keyboard controller issues
an interrupt to let the system know that there is new data in the keyboard buffer.
The kernel notes the interrupt number being issued and executes the correct
interrupt handler. The interrupt handler processes the keyboard data and lets the
keyboard controller know it is ready for more data. To provide synchronization,
the kernel can usually disable interrupts either all interrupts or just one specific
interrupt number. In many operating systems, including Linux or UNIX, the
interrupt handlers do not run in a process context. Instead, they run in a
special interrupt context that is not associated with any process. This special
context exists solely to let an interrupt handler quickly respond to an interrupt, and
then exit.
49. These contexts represent the breadth of the kernel's activities. In fact, in UNIX, we
can generalize that each processor is doing one of three things at any given
moment:
• In kernel-space, in process context, executing on behalf of a specific process
• In kernel-space, in interrupt context, not associated with a process, handling
an interrupt
• In user-space, executing user code in a process
This list is inclusive. Even corner cases fit into one of these three activities: For
example, when idle, it turns out that the kernel is executing an idle process in
process context in the kernel.
Linux versus Classic UNIX Kernels
Owing to their common ancestry and same API, modern UNIX kernels share
various design traits. With few exceptions, a UNIX kernel is typically a monolithic
static binary. That is, it exists as a large single-executable image that runs in a
single address space. UNIX systems typically require a system with a paged
memory-management unit; this hardware enables the system to enforce memory
protection and to provide a unique virtual address space to each process.
Monolithic Kernel versus Microkernel Designs
Operating kernels can be divided into two main design camps: the monolithic
kernel and the microkernel. (A third camp, exokernel is found primarily in research
systems but is gaining ground in real-world use.)
Monolithic kernels involve the simpler design of the two, and all kernels were
designed in this manner until the 1980s. Monolithic kernels are implemented
entirely as single large processes running entirely in a single address space.
Consequently, such kernels typically exist on disk as single static binaries. All
kernel services exist and execute in the large kernel address space. Communication
within the kernel is trivial because everything runs in kernel mode in the same
address space: The kernel can invoke functions directly, as a user-space application
might. Proponents of this model cite the simplicity and performance of the
monolithic approach. Most UNIX systems are monolithic in design.
Microkernel, on the other hand, is not implemented as single large processes.
Instead, the functionality of the kernel is broken down into separate processes,
usually called servers. Idealistically, only the servers absolutely requiring such
capabilities run in a privileged execution mode. The rest of the servers run in user-space.
All the servers, though, are kept separate and run in different address spaces.
Therefore, direct function invocation as in monolithic kernels is not possible.
Instead, communication in microkernel is handled via message passing: An inter-process
communication (IPC) mechanism is built into the system, and the various
servers communicate and invoke "services" from each other by sending messages
50. over the IPC mechanism. The separation of the various servers prevents a failure in
one server from bringing down another.
Likewise, the modularity of the system allows one server to be swapped out for
another. Because the IPC mechanism involves quite a bit more overhead than a
trivial function call, however, and because a context switch from kernel-space to
user-space or vice versa may be involved, message passing includes a latency and
throughput hit not seen on monolithic kernels with simple function invocation.
Consequently, all practical microkernel-based systems now place most or all the
servers in kernel-space, to remove the overhead of frequent context switches and
potentially allow for direct function invocation. The Windows NT kernel and
Mach (on which part of Mac OS X is based) are examples of microkernel. Neither
Windows NT nor Mac OS X run any microkernel servers in user-space in their
latest versions, defeating the primary purpose of microkernel designs altogether.
51. Bibliography:
http://www.tutorialspoint.com/unix/index.htm
http://gmarik.info/blog/2012/08/15/orphan-vs-zombie-vs-daemon-processes
www.cs.kent.edu/~farrell/osf03/oldnotes/L06.pdf
http://ocamlunix.forge.ocamlcore.org/threads.html
https://kb.iu.edu/d/aiau
This version of events is captured by the History & Timeline that can be found
at http://www.UNIX-systems.org/what_is_unix/
history_timeline.html
For more information about the *BSD family, see the FAQ at
http://www.faqs.org/faqs/386bsd-faq/part1/
http://www.linuxmall.com/Allann/lxtm.001.html
See http://www.gnu.org/gnu/linux-and-gnu.html
See The Halloween Documents at
http://www.opensource.org/halloween.html
UNIX OS by Yukun Liu and Yong Yue pdf book