operating system lecture notes
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
609
On Slideshare
609
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
32
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A.V.C.COLLEGE OF ENGINEERING MANNAMPANDAL, MAYILADUTHURAI-609 305 COURSE MATERIAL FOR THE SUBJECT OF OPERATING SYSTEMS Subject Code : CS 2254 Semester :IV SEMESTER Department : B.E CSE Academic Year : 2012-2013 Name of the Faculty : M.PARVATHI Designation and Dept : Asst Prof /CSE
  • 2. ANNA UNIVERSITY TIRUCHIRAPPALLI Tiruchirappalli – 620 024 Regulations 2008 Curriculum B.E. COMPUTER SCIENCE AND ENGINEERING SEM IV CS1253 – OPERATING SYSTEMS (Common to CSE and IT) UNIT I PROCESSES AND THREADS 9 Introduction to operating systems – Review of computer organization – Operating system structures – System calls – System programs – System structure – Virtual machines – Processes – Process concept – Process scheduling – Operations on processes – Cooperating processes – Interprocess communication – Communication in client-server systems – Case study – IPC in linux – Threads – Multi-threading models – Threading issues – Case study – Pthreads library. UNIT II PROCESS SCHEDULING AND SYNCHRONIZATION 10 CPU scheduling – Scheduling criteria – Scheduling algorithms – Multiple – Processor scheduling – Real time scheduling – Algorithm evaluation – Case study – Process scheduling in Linux – Process synchronization – The critical-section problem – Synchronization hardware – Semaphores – Classic problems of synchronization – Critical regions – Monitors – Deadlock system model – Deadlock characterization – Methods for handling deadlocks – Deadlock prevention – Deadlock avoidance – Deadlock detection – Recovery from deadlock. UNIT III STORAGE MANAGEMENT 9 Memory management – Background – Swapping – Contiguous memory allocation – Paging – Segmentation – Segmentation with paging – Virtual memory – Background – Demand paging – Process creation – Page replacement – Allocation of frames – Thrashing – Case study – Memory management in Linux. UNIT IV FILE SYSTEMS 9 File system interface – File concept – Access methods – Directory structure – Filesystem mounting – Protection – File system implementation – Directory implementation – Allocation methods – Free space management – Efficiency and performance – Recovery – Log structured file systems – Case studies – File system in Linux – File system in Windows XP. UNIT V I/O SYSTEMS 8 I/O Systems – I/O Hardware – Application I/O interface – Kernel I/O subsystem – Streams – Performance – Mass-storage structure – Disk scheduling – Disk
  • 3. management – Swap-space management – RAID – Disk attachment – Stable storage – Tertiary storage – Case study – I/O in Linux. Total: 45 TEXT BOOK 1. Silberschatz, Galvin and Gagne, “Operating System Concepts”, 6th Edition, Wiley India Pvt. Ltd., 2003. REFERENCES 1. Tanenbaum, A.S., “Modern Operating Systems”, 2nd Edition, Pearson Education, 2004. 2. Gary Nutt, “Operating Systems”, 3rd Edition, Pearson Education, 2004. 3. William Stallings, “Operating Systems”, 4th Edition, Prentice Hall of India, 2003.
  • 4. 1.Introduction 1.1 Introduction An operating system act as an intermediary between the user of a computer and computer hardware. The purpose of an operating system is to provide an environment in which a user can execute programs in a convenient and efficient manner. An operating system is a software that manages the computer hardware. The hardware must provide appropriate mechanisms to ensure the correct operation of the computer system and to prevent user programs from interfering with the proper operation of the system. 1.2 Operating System 1.2.1 Definition of Operating System: An Operating system is a program that controls the execution of application programs and acts as an interface between the user of a computer and the computer hardware.
  • 5. A more common definition is that the operating system is the one program running at all times on the computer (usually called the kernel), with all else being applications programs. An Operating system is concerned with the allocation of resources and services, such as memory, processors, devices and information. The Operating System correspondingly includes programs to manage these resources, such as a traffic controller, a scheduler, memory management module, I/O programs, and a file system. 1.2.2 Functions of Operating System Operating system performs three functions: 1. Convenience: An OS makes a computer more convenient to use. 2. Efficiency: An OS allows the computer system resources to be used in an efficient manner. 3. Ability to Evolve: An OS should be constructed in such a way as to permit the effective development, testing and introduction of new system functions without at the same time interfering with service. 1.2.3 Operating System as User Interface Every general purpose computer consists of the hardware, operating system, system programs, application programs. The hardware consists of memory, CPU, ALU, I/O devices, peripheral device and storage device. System program consists of compilers, loaders, editors, OS etc. The application program consists of business program, database program.
  • 6. The fig. 1.1 shows the conceptual view of a computer system
  • 7. Fig 1.1 Conceptual view of a computer system Every computer must have an operating system to run other programs. The operating system and coordinates the use of the hardware among the various system programs and application program for a various users. It simply provides an environment within which other programs can do useful work. The operating system is a set of special programs that run on a computer system that allow it to work properly. It performs basic tasks such as recognizing input from the keyboard, keeping track of files and directories
  • 8. on the disk, sending output to the display screen and controlling a peripheral devices. OS is designed to serve two basic purposes : 1. It controls the allocation and use of the computing system‘s resources among the various user and tasks. 2. It provides an interface between the computer hardware and the programmer that simplifies and makes feasible for coding, creation, debugging of application programs. The operating system must support the following tasks. The tasks are : 1. Provides the facilities to create, modification of program and data files using and editor. 2. Access to the compiler for translating the user program from high level language to machine language. 3. Provide a loader program to move the compiled program code to the computer‘s memory for execution. 4. Provide routines that handle the details of I/O programming. 1.3 I/O System Management I/O System Management The module that keeps track of the status of devices is called the I/O traffic controller. Each I/O device has a device handler that resides in a separate process associated with that device. The I/O subsystem consists of
  • 9. 1. A memory management component that includes buffering, caching and spooling. 2. A general device driver interface. Drivers for specific hardware devices. 1.4 Assembler Input to an assembler is an assembly language program. Output is an object program plus information that enables the loader to prepare the object program for execution. At one time, the computer programmer had at his disposal a basic machine that interpreted, through hardware, certain fundamental instructions. He would program this computer by writing a series of ones and zeros(machine language), place them into the memory of the machine. 1.5 Compiler The high level languages – examples are FORTRAN, COBOL, ALGOL and PL/I – are processed by compilers and interpreters. A compilers is a program that accepts a source program in a ―high-level language‖ and produces a corresponding object program. An interpreter is a program that appears to execute a source program as if it was machine language. The same name (FORTRAN, COBOL etc) is often used to designate both a compiler and its associated language. 1.6 Loader
  • 10. A loader is a routine that loads an object program and prepares it for execution. There are various loading schemes: absolute, relocating and direct- linking. In general, the loader must load, relocate, and link the object program. Loader is a program that places programs into memory and prepares them for execution. In a simple loading scheme, the assembler outputs the machine language translation of a program on a secondary device and a loader is placed in core. The loader places into memory the machine language version of the user‘s program and transfers control to it. Since the loader program is much smaller than the assembler, thos makes more core available to user‘s program. 1.7 History of Operating System Operating systems have been evolving through the years. Following table shows the history of OS. Generation Year Electronic devices used Types of OS and devices First 1945 – 55 Vacuum tubes Plug boards Second 1955 – 1965 Transistors Batch system Third 1965 – 1980 Integrated Circuit (IC) Multiprogrammin g Fourth Since 1980 Large scale integration PC The 1960’s definition of an operating system is “the software that controls the hardware”. However, today, due to microcode we need a better definition. We see an operating system as the programs that make the hardware useable. In brief, an operating system is the set of programs that controls a computer. Some examples of operating systems are UNIX, Mach, MS-DOS, MS-Windows, Windows/NT, Chicago, OS/2, MacOS, VMS, MVS, and VM. Controlling the computer involves software at several levels. We will differentiate kernel services, library services, and application-level services, all of which are part of the operating system. Processes run Applications, which are linked together with libraries that perform standard services. The kernel supports the processes by providing a path to the peripheral devices. The kernel responds to service calls from the processes and interrupts from the devices. The core of the operating system is the kernel, a control program that functions in privileged state (an execution context that allows all hardware instructions to be executed), reacting to interrupts from external devices and to service requests and traps from processes. Generally, the kernel is a permanent resident of the computer. It creates
  • 11. and terminates processes and responds to their request for service. Batch Systems . Batch operating system is one where programs and data are collected together in a batch before processing starts. A job is predefined sequence of commands, programs and data that are combined in to a single unit called job. . Fig. 2.1 shows the memory layout for a simple batch system. Memory management in batch system is very simple. Memory is usually divided into two areas : Operating system and user program area. Scheduling is also simple in batch system. Jobs are processed in the order of submission i.e first come first served fashion. When job completed execution, its memory is releases and the output for the job gets copied into an output spool for later printing. Batch system often provides simple forms of file management. Access to file is serial. Batch systems do not require any time critical device management. Batch systems are inconvenient for users because users can not interact with their jobs to fix problems. There may also be long turn around times. Example of this system id generating monthly bank statement. Advantages o Batch System Move much of the work of the operator to the computer. Increased performance since it was possible for job to start as soon as the previous job finished.
  • 12. Disadvantages of Batch System Turn around time can be large from user standpoint. Difficult to debug program. A job could enter an infinite loop. A job could corrupt the monitor, thus affecting pending jobs. Due to lack of protection scheme, one batch job can affect pending jobs. 2.5 Time Sharing Systems Multi-programmed batched systems provide an environment where the various system resources (for example, CPU, memory, peripheral devices) are utilized effectively. Time sharing, or multitasking, is a logical extension of multiprogramming. Multiple jobs are executed by the CPU switching between them, but the
  • 13. switches occur so frequently that the users may interact with each program while it is running. An interactive, or hands-on, computer system provides on-line communication between the user and the system. The user gives instructions to the operating system or to a program directly, and receives an immediate response. Usually, a keyboard is used to provide input, and a display screen (such as a cathode- ray tube (CRT) or monitor) is used to provide output. If users are to be able to access both data and code conveniently, an on-line file system must be available. A file is a collection of related information defined by its creator. Batch systems are appropriate for executing large jobs that need little interaction. Time-sharing systems were developed to provide interactive use of a computer system at a reasonable cost. A time-shared operating system uses CPU scheduling and multiprogramming to provide each user with a small portion of a time-shared computer. Each user has at least one separate program in memory. A program that is loaded into memory and is executing is commonly referred to as a process. When a process executes, it typically executes for only a short time before it either finishes or needs to perform I/O. I/O may be interactive; that is, output is to a display for the user and input is from a user keyboard. Since interactive I/O typically runs at people speeds, it may take a long time to completed. A time-shared operating system allows the many users to share the computer simultaneously. Since each action or command in a time-shared system tends to be short, only a little CPU time is needed for each user. As the system switches rapidly from one user to the next, each user is given the impression that she has her own computer, whereas actually one computer is being shared among many users.
  • 14. Time-sharing operating systems are even more complex than are multi- programmed operating systems. As in multiprogramming, several jobs must be kept simultaneously in memory, which requires some form of memory management and protection. 2.6 Multiprogramming When two or more programs are in memory at the same time, sharing the processor is referred to the multiprogramming operating system. Multiprogramming assumes a single processor that is being shared. It increases CPU utilization by organizing jobs so that the CPU always has one to execute. Fig. 2.2 shows the memory layout for a multiprogramming system. The operating system keeps several jobs in memory at a time. This set of jobs is a subset of the jobs kept in the job pool. The operating system picks and begins to execute one of the job in the memory.
  • 15. Multiprogrammed system provide an environment in which the various system resources are utilized effectively, but they do not provide for user interaction with the computer system. Jobs entering into the system are kept into the memory. Operating system picks the job and begins to execute one of the job in the memory. Having
  • 16. several programs in memory at the same time requires some form of memory management. Multiprogramming operating system monitors the state of all active programs and system resources. This ensures that the CPU is never idle unless there are no jobs. Advantages 1. High CPU utilization. 2. It appears that many programs are allotted CPU almost simultaneously. Disadvantages 1. CPU scheduling is requires. 2. To accommodate many jobs in memory, memory management is required. 2.7 Spooling Acronym for simultaneous peripheral operations on line. Spooling refers to putting jobs in a buffer, a special area in memory or on a disk where a device can access them when it is ready. Spooling is useful because device access data that different rates. The buffer provides a waiting station where data can rest while the slower device catches up. Fig 2.3 shows the spooling. System Components Even though, not all systems have the same structure many modern operating systems share the same goal of supporting the following types of system components. Process Management The operating system manages many kinds of activities ranging from user programs to system programs like printer spooler, name servers, file server etc. Each of
  • 17. these activities is encapsulated in a process. A process includes the complete execution context (code, data, PC, registers, OS resources in use etc.) It is important to note that a process is not a program. A process is only ONE instant of a program in execution. There are many processes can be running the same program. The five major activities of an operating system in regard to process management are Creation and deletion of user and system processes. Suspension and resumption of processes. A mechanism for process synchronization. A mechanism for process communication. A mechanism for deadlock handling. Main-Memory Management Primary-Memory or Main-Memory is a large array of words or bytes. Each word or byte has its own address. Main-memory provides storage that can be access directly by the CPU. That is to say for a program to be executed, it must in the main memory. The major activities of an operating in regard to memory-management are: Keep track of which part of memory are currently being used and by whom. Decide which process is loaded into memory when memory space becomes available. Allocate and deallocate memory space as needed. File Management A file is a collected of related information defined by its creator. Computer can store files on the disk (secondary storage), which provide long term storage. Some examples of storage media are magnetic tape, magnetic disk and optical disk. Each of
  • 18. these media has its own properties like speed, capacity, data transfer rate and access methods. File systems normally organized into directories to ease their use. These directories may contain files and other directions. The five main major activities of an operating system in regard to file management are The creation and deletion of files. The creation and deletion of directions. The support of primitives for manipulating files and directions. The mapping of files onto secondary storage. The back up of files on stable storage media. I/O System Management I/O subsystem hides the peculiarities of specific hardware devices from the user. Only the device driver knows the peculiarities of the specific device to whom it is assigned. Secondary-Storage Management Generally speaking, systems have several levels of storage, including primary storage, secondary storage and cache storage. Instructions and data must be placed in primary storage or cache to be referenced by a running program. Because main memory is too small to accommodate all data and programs, and its data are lost when power is lost, the computer system must provide secondary storage to back up main memory. Secondary storage consists of tapes, disks, and other media designed to hold information that will eventually be accessed in primary storage (primary, secondary, cache) is ordinarily divided into bytes or words consisting of a fixed number of bytes. Each location in storage has an address; the set of all addresses available to a program is called an address space.
  • 19. The three major activities of an operating system in regard to secondary storage management are: Managing the free space available on the secondary-storage device. Allocation of storage space when new files have to be written. Scheduling the requests for memory access. Networking A distributed systems is a collection of processors that do not share memory, peripheral devices, or a clock. The processors communicate with one another through communication lines called network. The communication-network design must consider routing and connection strategies, and the problems of contention and security. Protection System If computer systems has multiple users and allows the concurrent execution of multiple processes, then the various processes must be protected from one another's activities. Protection refers to mechanism for controlling the access of programs, processes, or users to the resources defined by computer systems. Command Interpreter System A command interpreter is an interface of the operating system with the user. The user gives commands with are executed by operating system (usually by turning them into system calls). The main function of a command interpreter is to get and execute the next user specified command. Command-Interpreter is usually not part of the kernel, since multiple command interpreters (shell, in UNIX terminology) may be support by an operating system, and they do not really need to run in kernel mode. There are two main advantages to separating the command interpreter from the kernel. If we want to change the way the command interpreter looks, i.e., I want to change the interface of command interpreter, I am able to do that if the command interpreter is separate from the kernel. I cannot change the code of the kernel so I cannot modify the interface.
  • 20. If the command interpreter is a part of the kernel it is possible for a malicious process to gain access to certain part of the kernel that it showed not have to avoid this ugly scenario it is advantageous to have the command interpreter separate from kernel. Operating Systems Services Following are the five services provided by an operating systems to the convenience of the users. Program Execution The purpose of a computer systems is to allow the user to execute programs. So the operating systems provides an environment where the user can conveniently run programs. The user does not have to worry about the memory allocation or multitasking or anything. These things are taken care of by the operating systems. Running a program involves the allocating and deallocating memory, CPU scheduling in case of multiprocess. These functions cannot be given to the user-level programs. So user-level programs cannot help the user to run programs independently without the help from operating systems. I/O Operations Each program requires an input and produces output. This involves the use of I/O. The operating systems hides the user the details of underlying hardware for the I/O. All the user sees is that the I/O has been performed without any details. So the operating systems by providing I/O makes it convenient for the users to run programs. For efficiently and protection users cannot control I/O so this service cannot be provided by user-level programs. File System Manipulation The output of a program may need to be written into new files or input taken from some files. The operating systems provide this service. The user does not have to worry about secondary storage management. User gives a command for reading or writing to a
  • 21. file and sees his task accomplished. Thus operating systems make it easier for user programs to accomplish their task. This service involves secondary storage management. The speed of I/O that depends on secondary storage management is critical to the speed of many programs and hence I think it is best relegated to the operating systems to manage it than giving individual users the control of it. It is not difficult for the user-level programs to provide these services but for above mentioned reasons it is best if this service s left with operating system. Communications There are instances where processes need to communicate with each other to exchange information. It may be between processes running on the same computer or running on the different computers. By providing this service the operating system relieves the user of the worry of passing messages between processes. In case where the messages need to be passed to processes on the other computers through a network it can be done by the user programs. The user program may be customized to the specifics of the hardware through which the message transits and provides the service interface to the operating system. Error Detection An error is one part of the system may cause malfunctioning of the complete system. To avoid such a situation the operating system constantly monitors the system for detecting the errors. This relieves the user of the worry of errors propagating to various part of the system and causing malfunctioning. This service cannot allow to be handled by user programs because it involves monitoring and in cases altering area of memory or deallocation of memory for a faulty process. Or may be relinquishing the CPU of a process that goes into an infinite loop. These tasks are too critical to be handed over to the user programs. A user program if given these privileges can interfere with the correct (normal) operation of the operating systems.
  • 22. System Calls and System Programs System calls provide an interface between the process an the operating system. System calls allow user-level processes to request some services from the operating system which process itself is not allowed to do. In handling the trap, the operating system will enter in the kernel mode, where it has access to privileged instructions, and can perform the desired service on the behalf of user-level process. It is because of the critical nature of operations that the operating system itself does them every time they are needed. For example, for I/O a process involves a system call telling the operating system to read or write particular area and this request is satisfied by the operating system. System programs provide basic functioning to users so that they do not need to write t heir own environment for program development (editors, compilers) and program execution (shells). In some sense, they are bundles of useful system calls. Layered Approach Design In this case the system is easier to debug and modify, because changes aff ect only limited portions of the code, and programmer does not have to know the details of the other layers. Information is also kept only where it is needed and is accessible only in certain ways, so bugs affecting that data are limited to a specific module or layer. Mechanisms and Policies The policies what is to be done while the mechanism specifies how it is to be done. For instance, the timer construct for ensuring CPU protection is mechanism. On the other hand, the decision of how long the timer is set for a particular user is a policy decision. The separation of mechanism and policy is important to provide flexibility to a system. If the interface between mechanism and policy is well defined, the change of
  • 23. policy may affect only a few parameters. On the other hand, if interface between these two is vague or not well defined, it might involve much deeper change to the system. Once the policy has been decided it gives the programmer the choice of using his/her own implementation. Also, the underlying implementation may be changed for a more efficient one without much trouble if the mechanism and policy are well defined. Specifically, separating these two provides flexibility in a variety of ways. First, the same mechanism can be used to implement a variety of policies, so changing the policy might not require the development of a new mechanism, but just a change in parameters for that mechanism, but just a change in parameters for that mechanism from a library of mechanisms. Second, the mechanism can be changed for example, to increase its efficiency or to move to a new platform, without changing the overall policy. Layered Approach Design In this case the system is easier to debug and modify, because changes affect only limited portions of the code, and programmer does not have to know the details of the other layers. Information is also kept only where it is needed and is accessible only in certain ways, so bugs affecting that data are limited to a specific module or layer. Definition of Process The notion of process is central to the understanding of operating systems. There are quite a few definitions presented in the literature, but no "perfect" definition has yet appeared. Definition The term "process" was first used by the designers of the MULTICS in 1960's. Since then, the term process, used somewhat interchangeably with 'task' or 'job'. The process has been given many definitions for instance A program in Execution.
  • 24. An asynchronous activity. The 'animated sprit' of a procedure in execution. The entity to which processors are assigned. The 'dispatch able' unit. Many more definitions have given. As we can see from above that there is no universally agreed upon definition, but the definition "Program in Execution" seem to be most frequently used. And this is a concept are will use in the present study of operating systems. Now that we agreed upon the definition of process, the question is what the relation between process and program is. It is same beast with different name or when this beast is sleeping (not executing) it is called program and when it is executing becomes process. Well, to be very precise. Process is not the same as program. In the following discussion we point out some of the difference between process and program. As we have mentioned earlier. Process is not the same as program. A process is more than a program code. A process is an 'active' entity as oppose to program which consider to be a 'passive' entity. As we all know that a program is an algorithm expressed in some suitable notation, (e.g., programming language). Being a passive, a program is only a part of process. Process, on the other hand, includes: Current value of Program Counter (PC) Contents of the processors registers Value of the variables The process stack (SP) which typically contains temporary data such as subroutine parameter, return address, and temporary variables. A data section that contains global variables. A process is the unit of work in a system. In Process model, all software on the computer is organized into a number of sequential processes. A process includes PC, registers, and variables. Conceptually, each process has its own virtual CPU. In reality, the CPU switches back and forth among processes. (The rapid switching back and forth is called multiprogramming).
  • 25. Process State The process state consist of everything necessary to resume the process execution if it is somehow put aside temporarily. The process state consists of at least following: Code for the program. Program's static data. Program's dynamic data. Program's procedure call stack. Contents of general purpose registers. Contents of program counter (PC) Contents of program status word (PSW). Operating Systems resource in use. Process Operations Process Creation In general-purpose systems, some way is needed to create processes as needed during operation. There are four principal events led to processes creation. System initialization. Execution of a process Creation System calls by a running process. A user request to create a new process. Initialization of a batch job. Foreground processes interact with users. Background processes that stay in background sleeping but suddenly springing to life to handle activity such as email, webpage, printing, and so on. Background processes are called daemons. This call creates an exact clone of the calling process. A process may create a new process by some create process such as 'fork'. It choose to does so, creating process is called parent process and the created one is called the child processes. Only one parent is needed to create a child process. Note that unlike plants and
  • 26. animals that use sexual representation, a process has only one parent. This creation of process (processes) yields a hierarchical structure of processes like one in the figure. Notice that each child has only one parent but each parent may have many children. After the fork, the two processes, the parent and the child, have the same memory image, the same environment strings and the same open files. After a process is created, both the parent and child have their own distinct address space. If either process changes a word in its address space, the change is not visible to the other process. Following are some reasons for creation of a process  User logs on.  User starts a program.  Operating systems creates process to provide service, e.g., to manage printer.  Some program starts another process, e.g., Netscape calls xv to display a picture. Process Termination A process terminates when it finishes executing its last statement. Its resources are returned to the system, it is purged from any system lists or tables, and its process control block (PCB) is erased i.e., the PCB's memory space is returned to a free memory pool. The new process terminates the existing process, usually due to following reasons: Normal Exist Most processes terminates because they have done their job. This call is exist in UNIX. Error Exist When process discovers a fatal error. For example, a user tries to compile a program that does not exist. Fatal Error An error caused by process due to a bug in program for example, executing an illegal instruction, referring non-existing memory or dividing by zero. Killed by another Process
  • 27. A process executes a system call telling the Operating Systems to terminate some other process. In UNIX, this call is kill. In some systems when a process kills all processes it created are killed as well (UNIX does not work this way). Process States A process goes through a series of discrete process states. New State The process being created. Terminated State The process has finished execution. Blocked (waiting) State When a process blocks, it does so because logically it cannot continue, typically because it is waiting for input that is not yet available. Formally, a process is said to be blocked if it is waiting for some event to happen (such as an I/O completion) before it can proceed. In this state a process is unable to run until some external event happens. Running State A process is said t be running if it currently has the CPU, that is, actually using the CPU at that particular instant. Ready State A process is said to be ready if it use a CPU if one were available. It is runable but temporarily stopped to let another process run. Logically, the 'Running' and 'Ready' states are similar. In both cases the process is willing to run, only in the case of 'Ready' state, there is temporarily no CPU available for it. The
  • 28. 'Blocked' state is different from the 'Running' and 'Ready' states in that the process cannot run, even if the CPU is available. Process State Transitions Following are six(6) possible transitions among above mentioned five (5) states Transition 1 occurs when process discovers that it cannot continue. If running process initiates an I/O operation before its allotted time expires, the running process voluntarily relinquishes the CPU. This state transition is: Block (process-name): Running → Block. Transition 2 occurs when the scheduler decides that the running process has run long enough and it is time to let another process have CPU time. This state transition is: Time-Run-Out (process-name): Running → Ready. Transition 3 occurs when all other processes have had their share and it is time for the first process to run again
  • 29. This state transition is: Dispatch (process-name): Ready → Running. Transition 4
  • 30. occurs when the external event for which a process was waiting (such as arrival of input) happens. This state transition is: Wakeup (process-name): Blocked → Ready. Transition 5 occurs when the process is created. This state transition is: Admitted (process-name): New → Ready. Transition 6 occurs when the process has finished execution. This state transition is: Exit (process-name): Running → Terminated. Process Control Block A process in an operating system is represented by a data structure known as a process control block (PCB) or process descriptor. The PCB contains important information about the specific process including The current state of the process i.e., whether it is ready, running, waiting, or whatever. Unique identification of the process in order to track "which is which" information. A pointer to parent process. Similarly, a pointer to child process (if it exists).
  • 31. The priority of process (a part of CPU scheduling information). Pointers to locate memory of processes. A register save area. The processor it is running on. The PCB is a certain store that allows the operating systems to locate key information about a process. Thus, the PCB is the data structure that defines a process to the operating systems. Threads Threads Despite of the fact that a thread must execute in process, the process and its associated threads are different concept. Processes are used to group resources together and threads are the entities scheduled for execution on the CPU. A thread is a single sequence stream within in a process. Because threads have some of the properties of processes, they are sometimes called lightweight processes. In a process, threads allow multiple executions of streams. In many respect, threads are popular way to improve application through parallelism. The CPU switches rapidly back and forth among the threads giving illusion that the threads are running in parallel. Like a traditional process i.e., process with one thread, a thread can be in any of several states (Running, Blocked, Ready or Terminated). Each thread has its own stack. Since thread will generally call different procedures and thus a different execution history. This is why thread needs its own stack. An operating system that has thread facility, the basic unit of CPU utilization is a thread. A thread has or consists of a program counter (PC), a register set, and a stack space. Threads are not independent of one other like processes as a result threads shares with other threads their code section, data section, OS resources also known as task, such as open files and signals.
  • 32. Processes Vs Threads As we mentioned earlier that in many respect threads operate in the same way as that of processes. Some of the similarities and differences are: Similarities  Like processes threads share CPU and only one thread active (running) at a time.  Like processes, threads within a processes, threads within a processes execute sequentially.  Like processes, thread can create children.  And like process, if one thread is blocked, another thread can run. Differences  Unlike processes, threads are not independent of one another.  Unlike processes, all threads can access every address in the task .  Unlike processes, thread is design to assist one other. Note that processes might or might not assist one another because processes may originate from different users. Why Threads? Following are some reasons why we use threads in designing operating systems. Processes with multiple threads make a great server for example printer server. Because threads can share common data, they do not need to use interposes communication. Because of the very nature, threads can take advantage of multiprocessors. Threads are cheap in the sense that They only need a stack and storage for registers therefore, threads are cheap to create.
  • 33. Threads use very little resources of an operating system in which they are working. That is, threads do not need new address space, global data, program code or operating system resources. Context switching are fast when working with threads. The reason is that we only have to save and/or restore PC, SP and registers. But this cheapness does not come free - the biggest drawback is that there is no protection between threads. User-Level Threads User-level threads implement in user-level libraries, rather than via systems calls, so thread switching does not need to call operating system and to cause interrupt to the kernel. In fact, the kernel knows nothing about user-level threads and manages them as if they were single-threaded processes. Advantages: The most obvious advantage of this technique is that a user-level threads package can be implemented on an Operating System that does not support threads. Some other advantages are User-level threads do not require modification to operating systems. Simple Representation: Each thread is represented simply by a PC, registers, stack and a small control block, all stored in the user process address space. Simple Management: This simply means that creating a thread, switching between threads and synchronization between threads can all be done without intervention of the kernel. Fast and Efficient:
  • 34. Thread switching is not much more expensive than a procedure call. Disadvantages: There is a lack of coordination between threads and operating system kernel . Therefore, process as whole gets one time slice irrespective of whether process has one thread or 1000 threads within. It is up to each thread to relinquish control to other threads. User-level threads require non-blocking systems call i.e., a multithreaded kernel. Otherwise, entire process will blocked in the kernel, even if there are unable threads left in the processes. For example, if one thread causes a page fault, the process blocks. Kernel-Level Threads In this method, the kernel knows about and manages the threads. No runtime system is needed in this case. Instead of thread table in each process, the kernel has a thread table that keeps track of all threads in the system. In addition, the kernel also maintains the traditional process table to keep track of processes. Operating Systems kernel provides system call to create and manage threads. Advantages: Because kernel has full knowledge of all threads, Scheduler may decide to give more time to a process having large number of threads than process having small number of threads. Kernel-level threads are especially good for applications that frequently block. Disadvantages: The kernel-level threads are slow and inefficient. For instance, threads operations are hundreds of times slower than that of user-level threads.
  • 35. Since kernel must manage and schedule threads as well as processes. It requires a full thread control block (TCB) for each thread to maintain information about threads. As a result there is significant overhead and increased in kernel complexity. Advantages of Threads over Multiple Processes Context Switching Threads are very inexpensive to create and destroy, and they are inexpensive to represent. For example, they require space to store, the PC, the SP, and the general - purpose registers, but they do not require space to share memory information, Information about open files of I/O devices in use, etc. With so little context, it is much faster to switch between threads. In other words, it is relatively easier for a contex t switch using threads. Sharing Treads allow the sharing of a lot resources that cannot be shared in process, for example, sharing code section, data section, Operating System resources like open file etc. Disadvantages of Threads over Multiprocesses Blocking The major disadvantage if that if the kernel is single threaded, a system call of one thread will block the whole process and CPU may be idle during the blocking period. Security Since there is, an extensive sharing among threads there is a potential problem of security. It is quite possible that one thread over writes the stack of another thread (or damaged shared data) although it is very unlikely since threads are meant to cooperate on a single task. Application that Benefits from Threads
  • 36. A proxy server satisfying the requests for a number of computers on a LAN would be benefited by a multi-threaded process. In general, any program that has to do more than one task at a time could benefit from multitasking. For example, a program that reads input, process it, and outputs could have three threads, one for each task. Application that cannot Benefit from Threads Any sequential process that cannot be divided into parallel task will not benefit from thread, as they would block until the previous one completes. For example, a program that displays the time of the day would not benefit from multiple threads. Resources used in Thread Creation and Process Creation When a new thread is created it shares its code section, data section and operating system resources like open files with other threads. But it is allocated its own stack, register set and a program counter. The creation of a new process differs from that of a thread mainly in the fact that all the shared resources of a thread are needed explicitly for each process. So though two processes may be running the same piece of code they need to have their own copy of the code in the main memory to be able to run. Two processes also do not share other resources with each other. This makes the creation of a new process very costly compared to that of a new thread. Context Switch To give each process on a multiprogrammed machine a fair share of the CPU, a hardware clock generates interrupts periodically. This allows the operating system to schedule all processes in main memory (using scheduling algorithm) to run on the CPU at
  • 37. equal intervals. Each time a clock interrupt occurs, the interrupt handler checks how much time the current running process has used. If it has used up its entire time slice, then the CPU scheduling algorithm (in kernel) picks a different process to run. Each switch of the CPU from one process to another is called a context switch. Major Steps of Context Switching The values of the CPU registers are saved in the process table of the process that was running just before the clock interrupt occurred. The registers are loaded from the process picked by the CPU scheduler to run next. In a multiprogrammed uniprocessor computing system, context switches occur frequently enough that all processes appear to be running concurrently. If a process has more than one thread, the Operating System can use the context switching technique to schedule the threads so they appear to execute in parallel. This is the case if threads are implemented at the kernel level. Threads can also be implemented entirely at the user level in run -time libraries. Since in this case no thread scheduling is provided by the Operating System, it is the responsibility of the programmer to yield the CPU frequently enough in each thread so all threads in the process can make progress. Action of Kernel to Context Switch Among Threads The threads share a lot of resources with other peer threads belonging to the same process. So a context switch among threads for the same process is easy. It involves switch of register set, the program counter and the stack. It is relatively easy for the kernel to accomplish this task. Action of kernel to Context Switch Among Processes Context switches among processes are expensive. Before a process can be switched its process control block (PCB) must be saved by the operating system. The PCB consists of the following information:  The process state.  The program counter, PC.  The values of the different registers.
  • 38.  The CPU scheduling information for the process.  Memory management information regarding the process.  Possible accounting information for this process.  I/O status information of the process. When the PCB of the currently executing process is saved the operating system loads the PCB of the next process that has to be run on CPU. This is a heavy task and it takes a lot of time. Solaris-2 Operating Systems Introduction  At user-level  At Intermediate-level  At kernel-level Introduction  The solaris-2 Operating Systems supports:  threads at the user-level.  threads at the kernel-level.  symmetric multiprocessing and  real-time scheduling.  The entire thread system in Solaris is depicted in following figure.
  • 39. At user-level The user-level threads are supported by a library for the creation and scheduling and kernel knows nothing of these threads. These user-level threads are supported by lightweight processes (LWPs). Each LWP is connected to exactly one kernel-level thread is independent of the kernel. Many user-level threads may perform one task. These threads may be scheduled and switched among LWPs without intervention of the kernel. User-level threads are extremely efficient because no context switch is needs to block one thread another to start running. Resource needs of User-level Threads A user-thread needs a stack and program counter. Absolutely no kernel resource are required. Since the kernel is not involved in scheduling these user-level threads, switching among user-level threads are fast and efficient. At Intermediate-level
  • 40. The lightweight processes (LWPs) are located between the user-level threads and kernel-level threads. These LWPs serve as a "Virtual CPUs" where user-threads can run. Each task contains at least one LWp. The user-level threads are multiplexed on the LWPs of the process. Resource needs of LWP An LWP contains a process control block (PCB) with register data, accounting information and memory information. Therefore, switching between LWPs requires quite a bit of work and LWPs are relatively slow as compared to user-level threads. At kernel-level The standard kernel-level threads execute all operations within the kernel. There is a kernel-level thread for each LWP and there are some threads that run only on the kernels behalf and have associated LWP. For example, a thread to service disk requests. By request, a kernel-level thread can be pinned to a processor (CPU). See the rightmost thread in figure. The kernel-level threads are scheduled by the kernel's scheduler and user-level threads blocks. SEE the diagram in NOTES In modern solaris-2 a task no longer must block just because a kernel-level threads blocks, the processor (CPU) is free to run another thread. Resource needs of Kernel-level Thread A kernel thread has only small data structure and stack. Switching between kernel threads does not require changing memory access information and therefore, kernel -level threads are relating fast and efficient.
  • 41. Unit 2 CPU/Process Scheduling The assignment of physical processors to processes allows processors to accomplish work. The problem of determining when processors should be assigned and to which processes is called processor scheduling or CPU scheduling. When more than one process is runable, the operating system must decide which one first. The part of the operating system concerned with this decision is called the scheduler, and algorithm it uses is called the scheduling algorithm. Goals of Scheduling (objectives) In this section we try to answer following question: What the scheduler try to achieve? Many objectives must be considered in the design of a scheduling discipline. In particular, a scheduler should consider fairness, efficiency, response time, turnaround time, throughput, etc., Some of these goals depends on the system one is using for example batch system, interactive system or real-time system, etc. but there are also some goals that are desirable in all systems. General Goals Fairness Fairness is important under all circumstances. A scheduler makes sure that each process gets its fair share of the CPU and no process can suffer indefinite postponement. Note that giving equivalent or equal time is not fair. Think of safety control and payroll at a nuclear plant. Policy Enforcement The scheduler has to make sure that system's policy is enforced. For example, if the local policy is safety then the safety control processes must be able to run whenever they want to, even if it means delay in payroll processes.
  • 42. Efficiency Scheduler should keep the system (or in particular CPU) busy cent percent of the time when possible. If the CPU and all the Input/Output devices can be kept running all the time, more work gets done per second than if some components are idle. Response Time A scheduler should minimize the response time for interactive user. Turnaround A scheduler should minimize the time batch users must wait for an output. Throughput A scheduler should maximize the number of jobs processed per unit time. A little thought will show that some of these goals are contradictory. It can be shown that any scheduling algorithm that favors some class of jobs hurts another class of jobs. The amount of CPU time available is finite, after all. Preemptive Vs Nonpreemptive Scheduling The Scheduling algorithms can be divided into two categories with respect to how they deal with clock interrupts. Nonpreemptive Scheduling A scheduling discipline is nonpreemptive if, once a process has been given the CPU, the CPU cannot be taken away from that process. Following are some characteristics of nonpreemptive scheduling In nonpreemptive system, short jobs are made to wait by longer jobs but the overall treatment of all processes is fair.
  • 43. In nonpreemptive system, response times are more predictable because incoming high priority jobs can not displace waiting jobs. In nonpreemptive scheduling, a schedular executes jobs in the following two situations. When a process switches from running state to the waiting state. When a process terminates. Preemptive Scheduling A scheduling discipline is preemptive if, once a process has been given the CPU can taken away. The strategy of allowing processes that are logically runable to be temporarily suspended is called Preemptive Scheduling and it is contrast to the "run to completion" method. CPU/Process Scheduling The assignment of physical processors to processes allows processors to accomplish work. The problem of determining when processors should be assigned and to which processes is called processor scheduling or CPU scheduling. When more than one process is runable, the operating system must decide which one first. The part of the operating system concerned with this decision is called the scheduler, and algorithm it uses is called the scheduling algorithm. First-Come-First-Served (FCFS) Scheduling Other names of this algorithm are:  First-In-First-Out (FIFO)  Run-to-Completion  Run-Until-Done
  • 44. Perhaps, First-Come-First-Served algorithm is the simplest scheduling algorithm is the simplest scheduling algorithm. Processes are dispatched according to their arrival time on the ready queue. Being a nonpreemptive discipline, once a process has a CPU, it runs to completion. The FCFS scheduling is fair in the formal sense or human sense of fairness but it is unfair in the sense that long jobs make short jobs wait and unimportant jobs make important jobs wait. FCFS is more predictable than most of other schemes since it offers time. FCFS scheme is not useful in scheduling interactive users because it cannot guarantee good response time. The code for FCFS scheduling is simple to write and understand. One of the major drawback of this scheme is that the average time is often quite long. The First-Come-First-Served algorithm is rarely used as a master scheme in modern operating systems but it is often embedded within other schemes. Round Robin Scheduling One of the oldest, simplest, fairest and most widely used algorithm is round robin (RR).In the round robin scheduling, processes are dispatched in a FIFO manner but are given a limited amount of CPU time called a time-slice or a quantum.If a process does not complete before its CPU-time expires, the CPU is preempted and given to the next process waiting in a queue. The preempted process is then placed at the back of the ready list.Round Robin Scheduling is preemptive (at the end of time-slice) therefore it is effective in time-sharing environments in which the system needs to guarantee reasonable response times for interactive users. The only interesting issue with round robin scheme is the length of the quantum. Setting the quantum too short causes too many context switches and lower the CPU efficiency. On the other hand, setting the quantum too long may cause poor response time and appoximates FCFS.In any event, the average waiting time under round robin scheduling is often quite long.
  • 45. Shortest-Job-First (SJF) Scheduling Other name of this algorithm is Shortest-Process-Next (SPN). Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or process) with the smallest estimated run-time-to-completion is run next. In other words, when CPU is available, it is assigned to the process that has smallest next CPU burst. The SJF scheduling is especially appropriate for batch jobs for which the run times are known in advance. Since the SJF scheduling algorithm gives the minimum average time for a given set of processes, it is probably optimal. The SJF algorithm favors short jobs (or processors) at the expense of longer ones. The obvious problem with SJF scheme is that it requires precise knowledge of how long a job or process will run, and this information is not usually available.The best SJF algorithm can do is to rely on user estimates of run times. In the production environment where the same jobs run regularly, it may be possible to provide reasonable estimate of run time, based on the past performance of the process. But in the development environment users rarely know how their program will execute.Like FCFS, SJF is non preemptive therefore, it is not useful in timesharing environment in which reasonable response time must be guaranteed. Shortest-Job-First (SJF) Scheduling Other name of this algorithm is Shortest-Process-Next (SPN). Shortest-Job-First (SJF) is a non-preemptive discipline in which waiting job (or process) with the smallest estimated run-time-to-completion is run next. In other words, when CPU is available, it is assigned to the process that has smallest next CPU burst.
  • 46. The SJF scheduling is especially appropriate for batch jobs for which the run times are known in advance. Since the SJF scheduling algorithm gives the minimum average time for a given set of processes, it is probably optimal. The SJF algorithm favors short jobs (or processors) at the expense of longer ones. The obvious problem with SJF scheme is that it requires precise knowledge of how long a job or process will run, and this information is not usually available. The best SJF algorithm can do is to rely on user estimates of run times. In the production environment where the same jobs run regularly, it may be possible to provide reasonable estimate of run time, based on the past performance of the process. But in the development environment users rarely know how their program will execute.Like FCFS, SJF is non preemptive therefore, it is not useful in timesharing environment in which reasonable response time must be guaranteed. Shortest-Remaining-Time (SRT) Scheduling The SRT is the preemtive counterpart of SJF and useful in time-sharing environment. In SRT scheduling, the process with the smallest estimated run-time to completion is run next, including new arrivals. In SJF scheme, once a job begin executing, it run to completion. In SJF scheme, a running process may be preempted by a new arrival process with shortest estimated run-time. The algorithm SRT has higher overhead than its counterpart SJF. The SRT must keep track of the elapsed time of the running process and must handle occasional preemptions. In this scheme, arrival of small processes will run almost immediately. However, longer jobs have even longer mean waiting time.
  • 47. Priority Scheduling The basic idea is straightforward: each process is assigned a priority, and priority is allowed to run. Equal-Priority processes are scheduled in FCFS order. The shortest- Job-First (SJF) algorithm is a special case of general priority scheduling algorithm. An SJF algorithm is simply a priority algorithm where the priority is the inverse of the (predicted) next CPU burst. That is, the longer the CPU burst, the lower the priority and vice versa. Priority can be defined either internally or externally. Internally defined priorities use some measurable quantities or qualities to compute priority of a process. Examples of Internal priorities are  Time limits.  Memory requirements.  File requirements, for example, number of open files.  CPU Vs I/O requirements. Externally defined priorities are set by criteria that are external to operating system such as  The importance of process.  Type or amount of funds being paid for computer use.  The department sponsoring the work.  Politics. Priority scheduling can be either preemptive or non preemptive A preemptive priority algorithm will preemptive the CPU if the priority of the newly arrival process is higher than the priority of the currently running process. A non-preemptive priority algorithm will simply put the new process at the head of the ready queue. A major problem with priority scheduling is indefinite blocking or starvation. A solution to the problem of indefinite blockage of the low-priority process is aging. Aging
  • 48. is a technique of gradually increasing the priority of processes that wait in the system for a long period of time. Multilevel Queue Scheduling A multilevel queue scheduling algorithm partitions the ready queue in several separate queues. In a multilevel queue scheduling processes are permanently assigned to one queues. The processes are permanently assigned to one another, based on some property of the process, such as  Memory size  Process priority  Process type Algorithm choose the process from the occupied queue that has the highest priority, and run that process either  Preemptive or  Non-preemptively  Each queue has its own scheduling algorithm or policy. Possibility I If each queue has absolute priority over lower-priority queues then no process in the queue could run unless the queue for the highest-priority processes were all empty. For example, in the above figure no process in the batch queue could run unless th e queues for system processes, interactive processes, and interactive editing processes will all empty. Possibility II If there is a time slice between the queues then each queue gets a certain amount of CPU times, which it can then schedule among the processes in its queue. For instance;
  • 49. 80% of the CPU time to foreground queue using RR. 20% of the CPU time to background queue using FCFS. Since processes do not move between queue so, this policy has the advantage of low scheduling overhead, but it is inflexible. Multilevel Feedback Queue Scheduling Multilevel feedback queue-scheduling algorithm allows a process to move between queues. It uses many ready queues and associate a different priority with each queue. The Algorithm chooses to process with highest priority from the occupied queue and run that process either preemptively or unpreemptively. If the process uses too much CPU time it will moved to a lower-priority queue. Similarly, a process that wait too long in the lower-priority queue may be moved to a higher-priority queue may be moved to a highest-priority queue. Note that this form of aging prevents starvation. A process entering the ready queue is placed in queue 0. If it does not finish within 8 milliseconds time, it is moved to the tail of queue 1. If it does not complete, it is preempted and placed into queue 2. Processes in queue 2 run on a FCFS basis, only when queue 2 run on a FCFS basis, only when queue 0 and queue 1 are empty. Deadlock A set of process is in a deadlock state if each process in the set is waiting for an event that can be caused by only another process in the set. In other words, each member of the set of deadlock processes is waiting for a resource that can be released only by a deadlock process. None of the processes can run, none of them can release any resources, and none of them can be awakened. It is important to note that the number of processes and the number and kind of resources possessed and requested are unimportant.
  • 50. The resources may be either physical or logical. Examples of physical resources are Printers, Tape Drivers, Memory Space, and CPU Cycles. Examples of logical resources are Files, Semaphores, and Monitors. The simplest example of deadlock is where process 1 has been allocated non - shareable resources A, say, a tap drive, and process 2 has be allocated non-sharable resource B, say, a printer. Now, if it turns out that process 1 needs resource B (printer) to proceed and process 2 needs resource A (the tape drive) to proceed and these are the only two processes in the system, each is blocked the other and all useful work in the system stops. This situation ifs termed deadlock. The system is in deadlock state because each process holds a resource being requested by the other process neither process is willing to release the resource it holds. Preemptable and Nonpreemptable Resources Resources come in two flavors: preemptable and nonpreemptable. A preemptable resource is one that can be taken away from the process with no ill effects. Memory is an example of a preemptable resource. On the other hand, a nonpreemptable resource is one that cannot be taken away from process (without causing ill effect). For example, CD resources are not preemptable at an arbitrary moment. Reallocating resources can resolve deadlocks that involve preemptable resources. Deadlocks that involve nonpreemptable resources are difficult to deal with. Necessary and Sufficient Deadlock Conditions 1. Mutual Exclusion Condition The resources involved are non-shareable. Explanation: At least one resource (thread) must be held in a non-shareable mode, that is, only one process at a time claims exclusive control of the resource. If another process requests that resource, the requesting process must be delayed until the resource has been released.
  • 51. 2. Hold and Wait Condition Requesting process hold already, resources while waiting for requested resources. Explanation: There must exist a process that is holding a resource already allocated to it while waiting for additional resource that are currently being held by other processes. 3. No-Preemptive Condition Resources already allocated to a process cannot be preempted. Explanation: Resources cannot be removed from the processes are used to completion or released voluntarily by the process holding it. 4. Circular Wait Condition The processes in the system form a circular list or chain where each process in the list is waiting for a resource held by the next process in the list. As an example, consider the traffic deadlock in the following figure
  • 52. Consider each section of the street as a resource. Mutual exclusion condition applies, since only one vehicle can be on a section of the street at a time. Hold-and-wait condition applies, since each vehicle is occupying a section of the street, and waiting to move on to the next section of the street. No-preemptive condition applies, since a section of the street that is a section of the street that is occupied by a vehicle cannot be taken away from it. Circular wait condition applies, since each vehicle is waiting on the next vehicle to move. That is, each vehicle in the traffic is waiting for a section of street held by the next vehicle in the traffic. The simple rule to avoid traffic deadlock is that a vehicle should only enter an intersection if it is assured that it will not have to stop inside the intersection. It is not possible to have a deadlock involving only one single process. The deadlock involves a circular “hold-and-wait” condition between two or more processes,
  • 53. so “one” process cannot hold a resource, yet be waiting for another resource that it is holding. In addition, deadlock is not possible between two threads in a process, because it is the process that holds resources, not the thread that is, each thread has access to the resources held by the process. Deadlock Prevention Elimination of “Mutual Exclusion” Condition The mutual exclusion condition must hold for non-sharable resources. That is, several processes cannot simultaneously share a single resource. This condition is difficult to eliminate because some resources, such as the tap drive and printer, are inherently non-shareable. Note that shareable resources like read-only-file do not require mutually exclusive access and thus cannot be involved in deadlock. Elimination of “Hold and Wait” Condition There are two possibilities for elimination of the second condition. The first alternative is that a process request be granted all of the resources it needs at once, prior to execution. The second alternative is to disallow a process from requesting resources whenever it has previously allocated resources. This strategy requires that all of the resources a process will need must be requested at once. The system must grant resources on “all or none” basis. If the complete set of resources needed by a process is not currently available, then the process must wait until the complete set is available. While the process waits, however, it may not hold any resources. Thus the “wait for” condition is denied and deadlocks simply cannot occur. This strategy can lead to serious waste of resources. For example, a program requiring ten tap drives must request and receive all ten derives before it begins executing. If the program needs only one tap drive to begin execution and then does not need the remaining tap drives for several hours. Then substantial computer resources (9 tape drives) will sit idle for several hours. This strategy
  • 54. can cause indefinite postponement (starvation). Since not all the required resources may become available at once. Elimination of “No-preemption” Condition The nonpreemption condition can be alleviated by forcing a process waiting for a resource that cannot immediately be allocated to relinquish all of its currently held resources, so that other processes may use them to finish. Suppose a system does allow processes to hold resources while requesting additional resources. Consider what happens when a request cannot be satisfied. A process holds resources a second process may need in order to proceed while second process may hold the resources needed by the first process. This is a deadlock. This strategy require that when a process that is holding some resources is denied a request for additional resources. The process must release its held resources and, if necessary, request them again together with additional resources. Implementation of this strategy denies the “no-preemptive” condition effectively. High Cost When a process release resources the process may lose all its work to that point. One serious consequence of this strategy is the possibility of indefinite postponement (starvation). A process might be held off indefinitely as it repeatedly requests and releases the same resources. Elimination of “Circular Wait” Condition The last condition, the circular wait, can be denied by imposing a total ordering on all of the resource types and than forcing, all processes to request the resources in order (increasing or decreasing). This strategy impose a total ordering of all resources types, and to require that each process requests resources in a numerical order (increasing or decreasing) of enumeration. With this rule, the resource allocation graph can never have a cycle. For example, provide a global numbering of all the resources, as shown
  • 55. Now the rule is this: processes can request resources whenever they want to, but all requests must be made in numerical order. A process may request first printer and then a tape drive (order: 2, 4), but it may not request first a plotter and then a printer (order: 3, 2). The problem with this strategy is that it may be impossible to find an ordering that satisfies everyone. Deadlock Avoidance This approach to the deadlock problem anticipates deadlock before it actually occurs. This approach employs an algorithm to access the possibility that deadlock could occur and acting accordingly. This method differs from deadlock prevention, which guarantees that deadlock cannot occur by denying one of the necessary conditions of deadlock. If the necessary conditions for a deadlock are in place, it is still possible to avoid deadlock by being careful when resources are allocated. Perhaps the most famous deadlock avoidance algorithm, due to Dijkstra [1965], is the Banker’s algorithm. So named because the process is analogous to that used by a banker in deciding if a loan can be safely made. In this analogy Banker’s Algorithm CustomersUsed Max A 0 6 B 0 5 C 0 4 D 0 7 Available Units = 10 Fig. 1
  • 56. In the above figure, we see four customers each of whom has been granted a number of credit nits. The banker reserved only 10 units rather than 22 units to servi ce them. At certain moment, the situation becomes CustomersUsed Max A 1 6 B 1 5 Available C 2 4 Units = 2 Safe State D 4 7 The key to a state being safe is that there is at least one way for all users to finish. In other analogy, the state of figure 2 is safe because with 2 units left, the banker can delay any request except C's, thus letting C finish and release all four resources. With four units in hand, the banker can let either D or B have the necessary units and so on. Unsafe State Consider what would happen if a request from B for one more unit were granted in above We would have following situation CustomersUsed Max A 1 6 B 2 5 C 2 4 D 4 7 Available Units = 1 This is an unsafe state.
  • 57. If all the customers namely A, B, C, and D asked for their maximum loans, then banker could not satisfy any of them and we would have a deadlock. Important Note: It is important to note that an unsafe state does not imply the existence or even the eventual existence a deadlock. What an unsafe state does imply is simply that some unfortunate sequence of events might lead to a deadlock. The Banker's algorithm is thus to consider each request as it occurs, and see if granting it Deadlock Detection Deadlock detection is the process of actually determining that a deadlock exists and identifying the processes and resources involved in the deadlock. The basic idea is to check allocation against resource availability for all possible allocation sequences to determine if the system is in deadlocked state a. Of course, the deadlock detection algorithm is only half of this strategy. Once a deadlock is detected, there needs to be a way to recover several alternatives exists: Temporarily prevent resources from deadlocked processes. Back off a process to some check point allowing preemption of a needed resource and restarting the process at the checkpoint later. Successively kill processes until the system is deadlock free. These methods are expensive in the sense that each iteration calls the detec tion algorithm until the system proves to be deadlock free. The complexity of algorithm is O(N 2 ) where N is the number of proceeds. Another potential problem is starvation; same process killed repeatedly. File System Implementation  File-System Structure  File-System Implementation  Directory Implementation
  • 58.  Allocation Methods  Free-Space Management  Efficiency and Performance  Recovery  Log-Structured File Systems  NFS  Example: WAFL File System Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block allocation and free-block algorithms and trade-offs File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary storage (disks) File system organized into layers File control block – storage structure consisting of information about a file Layered File System
  • 59. A Typical File Control Block
  • 60. The following figure illustrates the necessary file system structures provided by the operating systems. Virtual File Systems Virtual File Systems (VFS) provide an object-oriented way of implementing file systems. VFS allows the same system call interface (the API) to be used for different types of file systems. The API is to the VFS interface, rather than any specific type of file system. Schematic View of Virtual File System
  • 61. Directory Implementation  Linear list of file names with pointer to the data blocks. simple to program time-consuming to execute  Hash Table – linear list with hash data structure. decreases directory search time collisions – situations where two file names hash to the same location fixed size Allocation Methods An allocation method refers to how disk blocks are allocated for files:
  • 62. Contiguous allocation Linked allocation
  • 63. Indexed allocation Contiguous Allocation  Each file occupies a set of contiguous blocks on the disk  Simple – only starting location (block #) and length (number of blocks) are required Random access Wasteful of space (dynamic storage-allocation problem) Files cannot grow  Mapping from logical to physical Contiguous Allocation of Disk Space Extent-Based Systems Many newer file systems (I.e. Veritas File System) use a modified contiguous allocation scheme
  • 64. Extent-based file systems allocate disk blocks in extents An extent is a contiguous block of disks Extents are allocated for file allocation A file consists of one or more extents. Linked Allocation Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk.  Simple – need only starting address  Free-space management system – no waste of space
  • 65.  No random access  Mapping Indexed Allocation  Brings all pointers together into the index block.  Logical view.  Need index table  Random access  Dynamic access without external fragmentation, but have overhead of index block. Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need only 1 block for index table. Mapping from logical to physical in a file of unbounded length (block size of 512 words). Free-Space Management  Bit map requires extra space Example: block size = 2 12 bytes disk size = 2 30 bytes (1 gigabyte) n = 2 30 /2 12 = 2 18 bits (or 32K bytes)  Easy to get contiguous files  Linked list (free list)  Cannot get contiguous space easily  No waste of space  Grouping  Counting Free-Space Management  Need to protect:  Pointer to free list  Bit map  Must be kept on disk
  • 66.  Copy in memory and disk may differ  Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk Solution: Set bit[i] = 1 in disk Allocate block[i] Set bit[i] = 1 in memory Directory Implementation  Linear list of file names with pointer to the data blocks simple to program time-consuming to execute  Hash Table – linear list with hash data structure decreases directory search time collisions – situations where two file names hash to the same location fixed size Linked Free Space List on Disk Efficiency and Performance Efficiency dependent on: disk allocation and directory algorithms types of data kept in file’s directory entry Performance disk cache – separate section of main memory for frequently used blocks free-behind and read-ahead – techniques to optimize sequential access improve PC performance by dedicating section of memory as virtual disk, or RAM disk
  • 67. Page Cache A page cache caches pages rather than disk blocks using virtual memory techniques Memory-mapped I/O uses a page cache Routine I/O through the file system uses the buffer (disk) cache This leads to the following figure I/O Without a Unified Buffer Cache
  • 68. Unified Buffer Cache A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O I/O Using a Unified Buffer Cache Recovery Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical) Recover lost file or disk by restoring data from backup Log Structured File Systems Log structured (or journaling) file systems record each update to the file system as a transaction All transactions are written to a log A transaction is considered committed once it is written to the log However, the file system may not yet be updated The transactions in the log are asynchronously written to the file system When the file system is modified, the transaction is removed from the log If the file system crashes, all remaining transactions in the log must still be p erformed
  • 69. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs (or WANs).The implementation is part of the Solaris and SunOS operating systems running on Sun workstations using an unreliable datagram protocol (UDP/IP protocol and EthernetInterconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner. A remote directory is mounted over a local file system directory The mounted directory looks like an integral subtree of the local file system, replacing the subtree descending from the local directory Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided Files in the remote directory can then be accessed in a transparent manner Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory NFS is designed to operate in a heterogeneous environment of different machines, operating systems, and network architectures; the NFS specifications independent of these media. This independence is achieved through the use of RPC primitives built on top of an External Data Representation (XDR) protocol used between two implementation - independent interfaces The NFS specification distinguishes between the services provided by a mount mechanism and the actual remote-file-access services NFS Mount Protocol Establishes initial logical connection between server and client Mount operation includes name of remote directory to be mounted and name of server machine storing it
  • 70. Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them Following a mount request that conforms to its export list, the server returns a file handle—a key for further accesses File handle – a file-system identifier, and an inode number to identify the mounted directory within the exported file system The mount operation changes only the user’s view and does not affect the server side NFS Protocol Provides a set of remote procedure calls for remote file operations. The procedures support the following operations: searching for a file within a directory reading a set of directory entries manipulating links and directories accessing file attributes reading and writing files NFS servers are stateless; each request has to provide a full set of arguments (NFS V4 is just coming available – very different, stateful) Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching) The NFS protocol does not provide concurrency-control mechanisms
  • 71. Three Major Layers of NFS Architecture UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors) Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types The VFS activates file-system-specific operations to handle local requests according to their file-system types Calls the NFS protocol procedures for remote requests NFS service layer – bottom layer of the architecture Implements the NFS protocol Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names NFS Remote Operations Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs (except opening and closing files) NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes Cached file blocks are used only if the corresponding cached attributes are up to date File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server
  • 72. Clients do not free delayed-write blocks until the server confirms that the data have been written to disk Example: WAFL File System Used on Network Appliance “Filers” – distributed file system appliances “Write-anywhere file layout” Serves up NFS, CIFS, http, ftp Random I/O optimized, write optimized NVRAM for write caching Similar to Berkeley Fast File System, with extensive modifications
  • 73. File-System Interface  File Concept  Access Methods  Directory Structure  File-System Mounting  File Sharing  Protection Objectives  To explain the function of file systems  To describe the interfaces to file systems  To discuss file-system design tradeoffs, including access methods, file sharing, file locking, and directory structures  To explore file-system protection File Concept Contiguous logical address space Types: Data numeric character binary Program File Structure None - sequence of words, bytes Simple record structure Lines Fixed length
  • 74. Variable length Complex Structures Formatted document Relocatable load file Can simulate last two with first method by inserting appropriate control characters Who decides: Operating system Program File Attributes  Name – only information kept in human-readable form  Identifier – unique tag (number) identifies file within file system  Type – needed for systems that support different types  Location – pointer to file location on device  Size – current file size  Protection – controls who can do reading, writing, executing  Time, date, and user identification – data for protection, security, and usage monitoring Information about files are kept in the directory structure, which is maintained on the disk File is an abstract data type  Create  Write  Read  Reposition within file  Delete  Truncate File Operations Open(Fi) – search the directory structure on disk for entry Fi, and move the content of entry to memory
  • 75. Close (Fi) – move the content of entry Fi in memory to directory structure on disk Open Files Several pieces of data are needed to manage open files: File pointer: pointer to last read/write location, per process that has the file open File-open count: counter of number of times a file is open – to allow removal of data from open-file table when last processes closes it Disk location of the file: cache of data access information Access rights: per-process access mode information Open File Locking Provided by some operating systems and file systems Mediates access to a file Mandatory or advisory: Mandatory – access is denied depending on locks held and requested Advisory – processes can find status of locks and decide what to do File Locking Example – Java API import java.io.*; import java.nio.channels.*; public class LockingExample { public static final boolean EXCLUSIVE = false; public static final boolean SHARED = true; public static void main(String arsg[]) throws IOException { FileLock sharedLock = null; FileLock exclusiveLock = null; try { RandomAccessFile raf = new RandomAccessFile("file.txt", "rw"); // get the channel for the file FileChannel ch = raf.getChannel(); // this locks the first half of the file - exclusive
  • 76. exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE); /** Now modify the data . . . */ // release the lock exclusiveLock.release(); SHARED); // this locks the second half of the file - shared sharedLock = ch.lock(raf.length()/2+1, raf.length(), /** Now read the data . . . */ // release the lock exclusiveLock.release(); } catch (java.io.IOException ioe) { System.err.println(ioe); }finally { if (exclusiveLock != null) exclusiveLock.release(); if (sharedLock != null) sharedLock.release(); } } } File Types – Name, Extension Access Methods  Sequential Access read next write next reset
  • 77. no read after last write (rewrite)  Direct Access read n write n position to n read next write next rewrite n n = relative block number Sequential-access File Simulation of Sequential Access on a Direct-access File Example of Index and Relative Files Directory Structure A collection of nodes containing information about all files A Typical File-system Organization Operations Performed on Directory  Search for a file  Create a file  Delete a file  List a directory  Rename a file  Traverse the file system Organize the Directory (Logically) to Obtain  Efficiency – locating a file quickly  Naming – convenient to users  Two users can have same name for different files  The same file can have several different names
  • 78.  Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …) A single directory for all users Single-Level Directory Separate directory for each user Tree-Structured Directories Two-Level Directory
  • 79. Efficient searching Grouping Capability Current directory (working directory) Absolute or relative path name Creating a new file is done in current directory Delete a file rm <file-name> Creating a new subdirectory is done in current directory mkdir <dir-name> Example: if in current directory /mail mkdir count Acyclic-Graph Directories Have shared subdirectories and files
  • 80. Two different names (aliasing) If dict deletes list ⇒ dangling pointer Solutions: Backpointers, so we can delete all pointers Variable size records a problem Backpointers using a daisy chain organization Entry-hold-count solution New directory entry type  Link – another name (pointer) to an existing file  Resolve the link – follow pointer to locate the file
  • 81.  General Graph Directory  General Graph Directory (Cont.) How do we guarantee no cycles? Allow only links to file not subdirectories Garbage collection Every time a new link is added use a cycle detection algorithm to determine whether it is OK File System Mounting A file system must be mounted before it can be accessed A unmounted file system (i.e. Fig. 11-11(b)) is mounted at a mount point (a) Existing. (b) Unmounted Partition Mount Point File Sharing Sharing of files on multi-user systems is desirable Sharing may be done through a protection scheme On distributed systems, files may be shared across a network Network File System (NFS) is a common distributed file-sharing method File Sharing – Multiple Users  User IDs identify users, allowing permissions and protections to be per-user  Group IDs allow users to be in groups, permitting group access rights File Sharing – Remote File Systems Uses networking to allow file system access between systems
  • 82. Manually via programs like FTP Automatically, seamlessly using distributed file systems Semi automatically via the world wide web Client-server model allows clients to mount remote file systems from servers Server can serve multiple clients Client and user-on-client identification is insecure or complicated NFS is standard UNIX client-server file sharing protocol CIFS is standard Windows protocol Standard operating system file calls are translated into remote calls Distributed Information Systems (distributed naming services) such as LDAP, DNS, NIS, Active Directory implement unified access to information needed for remote computing File Sharing – Failure Modes Remote file systems add new failure modes, due to network failure, server failure Recovery from failure can involve state information about status of each remote request Stateless protocols such as NFS include all information in each request, allowing easy recovery but less security File Sharing – Consistency Semantics Consistency semantics specify how multiple users are to access a shared file simultaneously Similar to process synchronization algorithms. Tend to be less complex due to disk I/O and network latency (for remote file systems Andrew File System (AFS) implemented complex remote file sharing semantics Unix file system (UFS) implements: Writes to an open file visible immediately to other users of the same open file Sharing file pointer to allow multiple users to read and write concurrently
  • 83. AFS has session semantics Writes only visible to sessions starting after the file is closed Protection File owner/creator should be able to control: Types of access  Read  Write  Execute  Append  Delete  List  Access Lists and Groups Mode of access: read, write, execute Three classes of users RWX a) owner access 7 ⇒ 1 1 1 RWX b) group access 6 ⇒ 1 1 0 RWX c) public access 1 ⇒ 0 0 1 Ask manager to create a group (unique name), say G, and add some users to the group. For a particular file (say game) or subdirectory, define an appropriate access.
  • 84. Mass-Storage Systems  Overview of Mass Storage Structure  Disk Structure  Disk Attachment  Disk Scheduling  Disk Management  Swap-Space Management  RAID Structure  Disk Attachment  Stable-Storage Implementation  Tertiary Storage Devices  Operating System Issues  Performance Issues Objectives Describe the physical structure of secondary and tertiary storage devices and the resulting effects on the uses of the devices Explain the performance characteristics of mass-storage devices Discuss operating-system services provided for mass storage, including RAID and HSM Overview of Mass Storage Structure Magnetic disks provide bulk of secondary storage of modern computers Drives rotate at 60 to 200 times per second Transfer rate is rate at which data flow between drive and computer Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency) Head crash results from disk head making contact with the disk surface That’s bad Disks can be removable Drive attached to computer via I/O bus
  • 85. Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI Host controller in computer uses bus to talk to disk controller built into drive or storage array Moving-head Disk Mechanism Magnetic tape Was early secondary-storage medium Relatively permanent and holds large quantities of data Access time slow Random access ~1000 times slower than disk Mainly used for backup, storage of infrequently-used data, transfer medium between systems Kept in spool and wound or rewound past read-write head Once data under head, transfer rates comparable to disk 20-200GB typical storage Common technologies are 4mm, 8mm, 19mm, LTO-2 and SDLT
  • 86. Disk Structure Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost. Disk Attachment Host-attached storage accessed through I/O ports talking to I/O busses SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and SCSI targets perform tasks Each target can have up to 8 logical units (disks attached to device controller FC is high-speed serial architecture Can be switched fabric with 24-bit address space – the basis of storage area networks (SANs) in which many hosts attach to many storage units Can be arbitrated loop (FC-AL) of 126 devices Network-Attached Storage Network-attached storage (NAS) is storage made available over a network rather than over a local connection (such as a bus) NFS and CIFS are common protocols Implemented via remote procedure calls (RPCs) between host and storage New iSCSI protocol uses IP network to carry the SCSI protocol
  • 87. Storage Area Network Common in large storage environments (and becoming more common) Multiple hosts attached to multiple storage arrays - flexible Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk bandwidth. Access time has two major components Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. Minimize seek time Seek time ≈ seek distance Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. Several algorithms exist to schedule the servicing of disk I/O requests. We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53
  • 88. FCFS SSTF Selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests. Illustration shows total head movement of 236 cylinders.
  • 89. SCAN The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. Sometimes called the elevator algorithm. Illustration shows total head movement of 208 cylinders.
  • 90. C-SCAN Provides a more uniform wait time than SCAN. The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.
  • 91. C-LOOK Version of C-SCAN Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk.
  • 92. Selecting a Disk-Scheduling Algorithm SSTF is common and has a natural appeal SCAN and C-SCAN perform better for systems that place a heavy load on the disk. Performance depends on the number and types of requests. Requests for disk service can be influenced by the file-allocation method. The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. Either SSTF or LOOK is a reasonable choice for the default algorithm. Disk Management Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. To use a disk to hold files, the operating system still needs to record its own data structures on the disk. Partition the disk into one or more groups of cylinders.
  • 93. Logical formatting or “making a file system”. Boot block initializes system. The bootstrap is stored in ROM. Bootstrap loader program. Methods such as sector sparing used to handle bad blocks. Booting from a Disk in Windows 2000 Swap-Space Management Swap-space — Virtual memory uses disk space as an extension of main memory. Swap-space can be carved out of the normal file system,or, more commonly, it can be in a separate disk partition. Swap-space management 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment. Kernel uses swap maps to track swap-space use. Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. RAID Structure Data Structures for Swapping on Linux Systems RAID – multiple disk drives provides reliability via redundancy. RAID is arranged into six different levels. RAID (cont) Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. Disk striping uses a group of disks as one storage unit.
  • 94. RAID schemes improve performance and improve the reliability of the storage system by storing redundant data. Mirroring or shadowing keeps duplicate of each disk. Block interleaved parity uses much less redundancy. RAID Levels
  • 95. RAID (0 + 1) and (1 + 0)
  • 96. Stable-Storage Implementation Write-ahead log scheme requires stable storage. To implement stable storage: Replicate information on more than one nonvolatile storage media with independent failure modes. Update information in a controlled manner to ensure that we can recover the stable data after any failure during data transfer or recovery. Tertiary Storage Devices Low cost is the defining characteristic of tertiary storage. Generally, tertiary storage is built using removable media Common examples of removable media are floppy disks and CD-ROMs; other types are available. Removable Disks Floppy disk — thin flexible disk coated with magnetic material, enclosed in a protective plastic case. Most floppies hold about 1 MB; similar technology is used for removable disks that hold more than 1 GB. Removable magnetic disks can be nearly as fast as hard disks, but they are at a greater risk of damage from exposure. Removable Disks (Cont.) A magneto-optic disk records data on a rigid platter coated with magnetic material. Laser heat is used to amplify a large, weak magnetic field to record a bit. Laser light is also used to read data (Kerr effect).
  • 97. The magneto-optic head flies much farther from the disk surface than a magnetic disk head, and the magnetic material is covered with a protective layer of plastic or glass; resistant to head crashes. Optical disks do not use magnetism; they employ special materials that are alte red by laser light. WORM Disks The data on read-write disks can be modified over and over. WORM (“Write Once, Read Many Times”) disks can be written only once. Thin aluminum film sandwiched between two glass or plastic platters. To write a bit, the drive uses a laser light to burn a small hole through the aluminum; information can be destroyed by not altered. Very durable and reliable. Read Only disks, such ad CD-ROM and DVD, com from the factory with the data pre- recorded. Tapes Compared to a disk, a tape is less expensive and holds more data, but random access is much slower. Tape is an economical medium for purposes that do not require fast random access, e.g., backup copies of disk data, holding huge volumes of data. Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library. stacker – library that holds a few tapes silo – library that holds thousands of tapes A disk-resident file can be archived to tape for low cost storage; the computer can stage it back into disk storage for active use.
  • 98. Operating System Issues Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications For hard disks, the OS provides two abstraction: Raw device – an array of data blocks. File system – the OS queues and schedules the interleaved requests from several applications. Application Interface Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is formatted and an empty file system is generated on the disk. Tapes are presented as a raw storage medium, i.e., and application does not not open a file on the tape, it opens the whole tape drive as a raw device. Usually the tape drive is reserved for the exclusive use of that application. Since the OS does not provide file system services, the application must decide how to use the array of blocks. Since every application makes up its own rules for how to organize a tape, a tape full of data can generally only be used by the program that created it. Tape Drives The basic operations for a tape drive differ from those of a disk drive. locate positions the tape to a specific logical block, not an entire track (corresponds to seek). The read position operation returns the logical block number where the tape head is. The space operation enables relative motion. Tape drives are “append-only” devices; updating a block in the middle of the tape also effectively erases everything beyond that block. An EOT mark is placed after a block that is written.
  • 99. File Naming The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer, and then use the cartridge in another computer. Contemporary OSs generally leave the name space problem unsolved for removable media, and depend on applications and users to figure out how to access and interpret the data. Some kinds of removable media (e.g., CDs) are so well standardized that all computers use them the same way. Hierarchical Storage Management (HSM) A hierarchical storage system extends the storage hierarchy beyond primary memory and secondary storage to incorporate tertiary storage — usually implemented as a jukebox of tapes or removable disks. Usually incorporate tertiary storage by extending the file system. Small and frequently used files remain on disk. Large, old, inactive files are archived to the jukebox. HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data. Speed Two aspects of speed in tertiary storage are bandwidth and latency. Bandwidth is measured in bytes per second. Sustained bandwidth – average data rate during a large transfer; # of bytes/transfer time. Data rate when the data stream is actually flowing. Effective bandwidth – average over the entire I/O time, including seek or locate, and cartridge switching. Drive’s overall data rate. Access latency – amount of time needed to locate data.
  • 100. Access time for a disk – move the arm to the selected cylinder and wait for the rotational latency; < 35 milliseconds. Access on tape requires winding the tape reels until the selected block reaches the tape head; tens or hundreds of seconds. Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk. The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives. A removable library is best devoted to the storage of infrequently used data, because the library can only satisfy a relatively small number of I/O requests per hour. Reliability A fixed disk drive is likely to be more reliable than a removable disk or tape drive. An optical cartridge is likely to be more reliable than a magnetic disk or tape. A head crash in a fixed hard disk generally destroys the data, whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed. Cost Main memory is much more expensive than disk storage The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive. The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years. Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives.
  • 101.  Price per Megabyte of DRAM, From 1981 to 2004  Price per Megabyte of Magnetic Hard Disk, From 1981 to 2004  Price per Megabyte of a Tape Drive, From 1984-2000  I/O Hardware  Application I/O Interface  Kernel I/O Subsystem I/O Systems  Transforming I/O Requests to Hardware Operations  Streams  Performance Objectives Explore the structure of an operating system’s I/O subsystem Discuss the principles of I/O hardware and its complexity Provide details of the performance aspects of I/O hardware and software I/O Hardware A Typical PC Bus Structure Device I/O Port Locations on PCs (partial) Polling Determines state of device command-ready busy Error Busy-wait cycle to wait for I/O from device Interrupts CPU Interrupt-request line triggered by I/O device
  • 102. Interrupt handler receives interrupts Maskable to ignore or delay some interrupts Interrupt vector to dispatch interrupt to correct handler Based on priority Some nonmaskable Interrupt mechanism also used for exceptions Interrupt-Driven I/O Cycle
  • 103. Intel Pentium Processor Event-Vector Table Direct Memory Access Used to avoid programmed I/O for large data movement Requires DMA controller Bypasses CPU to transfer data directly between I/O device and memory
  • 104. Six Step Process to Perform DMA Transfer  Application I/O Interface I/O system calls encapsulate device behaviors in generic classes Device-driver layer hides differences among I/O controllers from kernel Devices vary in many dimensions  Character-stream or block  Sequential or random-access  Sharable or dedicated  Speed of operation  read-write, read only, or write only  A Kernel I/O Structure  Characteristics of I/O Devices  Block and Character Devices Block devices include disk drives Commands include read, write, seek Raw I/O or file-system access Memory-mapped file access possible Character devices include keyboards, mice, serial ports Commands include get, put Libraries layered on top allow line editing Network Devices Varying enough from block and character to have own interface Unix and Windows NT/9x/2000 include socket interface Separates network protocol from network operation Includes select functionality Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes)
  • 105. Clocks and Timers Provide current time, elapsed time, timer Programmable interval timer used for timings, periodic interrupts ioctl (on UNIX) covers odd aspects of I/O such as clocks and timers Blocking and Nonblocking I/O Blocking - process suspended until I/O completed Easy to use and understand Insufficient for some needs Nonblocking - I/O call returns as much as available User interface, data copy (buffered I/O) Implemented via multi-threading Returns quickly with count of bytes read or written Asynchronous - process runs while I/O executes Difficult to use I/O subsystem signals process when I/O completed Two I/O Methods Kernel I/O Subsystem Scheduling Some I/O request ordering via per-device queue Some OSs try fairness Buffering - store data in memory while transferring between devices To cope with device speed mismatch To cope with device transfer size mismatch To maintain “copy semantics” Device-status Table Sun Enterprise 6000 Device-Transfer Rates
  • 106. Kernel I/O Subsystem Caching - fast memory holding copy of data Always just a copy Key to performance Spooling - hold output for a device If device can serve only one request at a time i.e., Printing Device reservation - provides exclusive access to a device System calls for allocation and deallocation Watch out for deadlock Error Handling OS can recover from disk read, device unavailable, transient write failures Most return an error number or code when I/O request fails System error logs hold problem reports I/O Protection User process may accidentally or purposefully attempt to disrupt normal operation via illegal I/O instructions All I/O instructions defined to be privileged I/O must be performed via system calls Memory-mapped and I/O port memory locations must be protected too Use of a System Call to Perform I/O Kernel Data Structures
  • 107. Kernel keeps state info for I/O components, including open file tables, network connections, character device state Many, many complex data structures to track buffers, memory allocation, “dirty” blocks Some use object-oriented methods and message passing to implement I/O UNIX I/O Kernel Structure I/O Requests to Hardware Operations Consider reading a file from disk for a process: Determine device holding file Translate name to device representation Physically read data from disk into buffer Make data available to requesting process Return control to process Life Cycle of An I/O Request STREAMS STREAM – a full-duplex communication channel between a user-level process and a device in Unix System V and beyond A STREAM consists of: - STREAM head interfaces with the user process - driver end interfaces with the device - zero or more STREAM modules between them. Each module contains a read queue and a write queue
  • 108. Message passing is used to communicate between queues The STREAMS Structure Performance I/O a major factor in system performance: Demands CPU to execute device driver, kernel I/O code Context switches due to interrupts Data copying Network traffic especially stressful
  • 109. Intercomputer Communications Improving Performance Reduce number of context switches Reduce data copying Reduce interrupts by using large transfers, smart controllers, polling Use DMA Balance CPU, memory, bus, and I/O performance for highest throughput Device-Functionality Progression ***The End***