Cs illuminated ch.11: File Systems & Directories


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cs illuminated ch.11: File Systems & Directories

  1. 1. Chapter 11 File Systems and Directories The previous chapter examined some of the roles an operating system plays. In particular, it described the management of processes, the CPU, and main memory. Another key resource that the operating system manages is secondary memory, most importantly magnetic disks. The organization of files and directories on disk plays a pivotal role in everyday computing. Like a card file on a desktop, the file system provides a way to access particular information in a well-organized manner. The directory structure organizes files into categories and subcategories. File systems and directory structures are explored in detail in this chapter. 349
  2. 2. 350 Chapter 11 File Systems and Directories Goals After studying this chapter, you should be able to: I I I I I I I I I I describe the purpose of files, file systems, and directories. distinguish between text and binary files. identify various file types by their extensions. explain how file types improve file usage. define the basic operations on a file. compare and contrast sequential and direct file access. discuss the issues related to file protection. describe a directory tree. create absolute and relative paths for a directory tree. describe several disk-scheduling algorithms. 11.1 File A named collection of data, used for organizing secondary memory File system The operating system’s logical view of the files it manages Directory A named group of files File Systems In Chapter 5 we established the differences between main and secondary memory. Recall that main memory is where active programs and data are held while in use. Main memory is volatile, meaning that the information stored on it is lost if electric power is turned off. Secondary memory is nonvolatile—the information stored on it is maintained even when power is not on. Thus we use secondary memory for permanent storage of our information. The most prevalent secondary storage device is the magnetic disk drive. This includes both hard drives in the computer’s main box and floppy disks that are portable and can be moved easily between computers. The basic concepts underlying both types of disks are the same. Other secondary memory devices, such as tape drives, are used primarily for archival purposes. Though many of the concepts that we explore in this chapter apply to all secondary storage devices, it’s perhaps easiest to think about a standard disk drive. We store information on a disk in files, a mechanism for organizing data on an electronic medium. A file is a named collection of related data. From the user’s point of view, a file is the smallest amount of information that can be written to secondary memory. Organizing everything into files presents a uniform view for information storage. A file system is the logical view that an operating system provides so that users can manage information as a collection of files. A file system is often organized by grouping files into directories.
  3. 3. 11.1 File Systems 351 A file is a generic concept. Different types of files are managed in different ways. A file, in general, contains a program (in some form) or data (of one type or another). Some files have a very rigid format; others are more flexible. A file is a sequence of bits, bytes, lines, or records, depending on how you look at it. Like any data in memory, you have to apply an interpretation to the bits stored in a file before they have meaning. The creator of a file decides how the data in a file is organized, and any users of the file must understand that organization. Text and Binary Files Broadly, all files can be classified as either a text file or a binary file. In a text file the bytes of data are organized as characters from the ASCII or Unicode character sets. (Character sets are described in Chapter 3.) A binary file requires a specific interpretation of the bits based on the information in the file. The terms text file and binary file are somewhat misleading. They seem to imply that the information in a text file is not stored as binary data. Ultimately, all information on a computer is stored as binary digits. These terms refer to how those bits are formatted: as chunks of 8 or 16 bits, interpreted as characters, or in some other special format. Some information lends itself to a character representation, which often makes it easier for a human to read and modify. Though text files contain nothing but characters, those characters can represent a variety of information. For example, an operating system may store much of its data as text files, such as information about user accounts. A program written in a high-level language is stored as a text file, which is sometimes referred to as a source file. Any text editor can be used to create, view, and change the contents of a text file, no matter what specific type of information it contains. For other information types it is more logical and efficient to represent data by defining a specific binary format and interpretation. Only programs set up to interpret that type of data can be used to view or modify it. For example, there are many types of files that store image information: bitmap, GIF, JPEG, and TIFF, to name a few. As we discussed in Chapter 3, though they each store information about an image, they all store that information in different ways. Their internal formats are very specific. A program must be set up to view or modify a specific type of binary file. That’s why a program that can handle a GIF image may not be able to handle a TIFF image, or vice versa. Some files you might assume to be text files actually are not. Consider, for instance, a report that you type in a word processor program and save Text file A file that contains characters Binary file A file that contains data in a specific format, requiring a special interpretation of its bits
  4. 4. 352 Chapter 11 File Systems and Directories to disk. The document is actually stored as a binary file because, in addition to the characters that are stored in the document, it also contains information about formatting, styles, borders, fonts, colors and “extras” such as graphics or clip art. Some of the data (the characters themselves) are stored as text, but the additional information requires that each word processing program has its own format for the data in its document files. File Types File type The specific kind of information contained in a file, such as a Java program or a Microsoft Word document File extension Part of a file name that indicates the file type Most files, whether they are in text or binary format, contain a specific type of information. For example, a file may contain a Java program, or a JPEG image, or an MP3 audio clip. Some files contain files created by specific applications, such as a Microsoft Word document or a Visio drawing. The kind of information contained in a document is called the file type. Most operating systems recognize a list of specific file types. A common mechanism for specifying a file type is to indicate the type as part of the name of the file. File names are often separated, usually by a period, into two parts: the main name and the file extension. The extension indicates the type of the file. For example, the .java extension in the file name MyProg.java indicates that it is a Java source code program file. The .jpg extension in the file name family.jpg indicates that it is a JPEG image file. Some common file extensions are listed in Figure 11.1. File types allow the operating system to operate on the file in ways that make sense for that file. They also usually make life easier for the user. The operating system keeps a list of recognized file types and associates each type with a particular kind of application program. In an operating system with a graphical user interface, a particular icon is often associated with a file type as well. When you see a file in a folder, it is shown with the appropriate icon. That makes it easier for the user to identify a file at a glance because now both the name of the file and its icon indicate what type of file it is. When you double-click on the icon to open the program, the operating system starts the program associated with that file type and loads the file. Extensions File type txt text data file mp3, au, wav Figure 11.1 Some common file types and their extensions audio file gif, tiff, jpg image file doc, wp3 word processing document java, c, cpp program source files
  5. 5. 11.1 File Systems For example, you might like a particular editor that you use when developing a Java program. You can register the .java file extension with the operating system and associate it with that editor. Then whenever you open a file with a .java extension, the operating system runs the appropriate editor. The details of how you associate an extension with an application program depend on the operating system you are using. Some file extensions are associated with particular programs by default, which you may change if appropriate. In some cases, a file type could be associated with various types of applications, so you have some choice. For example, your system may currently associate the .gif extension with a particular Web browser, so that when you open a GIF image file, it is displayed in that browser window. You may choose to change the association so that when you open a GIF file it is brought into your favorite image editor instead. Note that a file extension is merely an indication of what the file contains. You can name a file anything you want (as long as you use the characters that the operating system allows for file names). You could give any file a .gif extension, for instance, but that doesn’t make it a GIF image file. Changing the extension does not change the data in the file or its internal format. If you attempt to open a misnamed file in a program that expects a particular format, you get errors. File Operations There are several operations that you, with the help of the operating system, might do to and with a file: I I I I I I I I I I I Create a file. Delete a file. Open a file. Close a file. Read data from a file. Write data to a file. Reposition the current file pointer in a file. Append data to the end of a file. Truncate a file (delete its contents). Rename a file. Copy a file. Let’s examine briefly how each of these operations is accomplished. The operating system keeps track of secondary memory in two ways. It maintains a table indicating which blocks of memory are free (that is, 353
  6. 6. 354 Chapter 11 File Systems and Directories available for use), and for each directory, it maintains a table that records information about the files in that directory. To create a file, the operating system finds free space in the file system for the file content, puts an entry for the file in the appropriate directory table, and records the name and location of the file. To delete a file, the operating system indicates that the memory space the file was using is now free, and the appropriate entry in the directory table is removed. Most operating systems require that a file be opened before read and write operations are performed on it. The operating system maintains a small table of all currently open files to avoid having to search for the file in the large file system every time a subsequent operation is performed. To close the file when it is no longer in active use, the operating system removes the entry in the open file table. At any point in time, an open file has a current file pointer (an address) indicating the place where the next read or write operation should occur. Some systems keep a separate read pointer and a write pointer for a file. Reading a file means that the operating system delivers a copy of the information in the file, starting at the current file pointer. After the read occurs, the file pointer is updated. Writing information to a file records the specified information to the file space at the location indicated by the current file pointer, and then the file pointer is updated. Often an operating system allows a file to be open for reading or writing, but not both at the same time. The current file pointer for an open file might be repositioned to another location in the file to prepare for the next read or write operation. Appending information to the end of a file requires that the file pointer be positioned to the end of a file; then the appropriate data is written. It is sometimes useful to “erase” the information in a file. Truncating a file means deleting the contents of the file without removing the administrative entries in the file tables. This operation is provided to avoid the need to delete a file and then recreate it. Sometimes the truncating operation is sophisticated enough to erase part of a file, from the current file pointer to the end of the file. An operating system also provides an operation to change the name of a file, which is called renaming the file. It also provides the ability to create a complete copy of the contents of a file, giving the copy a new name. File Access There are various ways in which the information in a file can be accessed. Some operating systems provide only one type of file access, while others provide a choice. The type of access available for a given file is established when the file is created.
  7. 7. 11.1 File Systems End Beginning Current file pointer Rewind Read or write Figure 11.2 Sequential file access Let’s examine the two primary access techniques: sequential access and direct access. The differences between these two techniques are analogous to the differences between the sequential nature of magnetic tape and the direct access of a magnetic disk, as discussed in Chapter 5. However, both types of files can be stored on either type of medium. File access techniques define the ways that the current file pointer can be repositioned. They are independent of the physical restrictions of the devices on which the file is stored. The most common access technique, and the simplest to implement, is sequential access, which views the file as a linear structure. It requires that the information in the file be processed in order. Read and write operations move the current file pointer according to the amount of data that is read or written. Some systems allow the file pointer to be reset to the beginning of the file and/or to skip forwards or backwards a certain number of records. See Figure 11.2. Files with direct access are conceptually divided into numbered logical records. Direct access allows the user to set the file pointer to any particular record by specifying the record number. Therefore, the user can read and write records in any particular order desired, as shown in Figure 11.3. Sequential file access The technique in which data in a file is accessed in a linear fashion Direct file access The technique in which data in a file is accessed directly, by specifying logical record numbers Current file pointer 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... Jump to any logical record then read or write End Beginning 355 Figure 11.3 Direct file access
  8. 8. 356 Chapter 11 File Systems and Directories Direct access files are more complicated to implement, but are helpful in situations where specific portions of large data stores must be available quickly, such as in a database. File Protection In multiuser systems, file protection is of primary importance. That is, we don’t want one user to be able to access another user’s files unless the access is specifically allowed. It is the operating system’s responsibility to ensure valid file access. Different operating systems administer their file protection in different ways. In any case, a file protection mechanism determines who can use a file and for what general purpose. For example, a file’s protection settings in the Unix operating system is divided into three categories: Owner, Group, and World. Under each category you can determine if the file can be read, written, and/or executed. Under this mechanism, if you can write to a file, you can also delete the file. Each file is “owned” by a particular user, often the creator of the file. The Owner usually has the strongest permissions regarding the file. A file may have a group name associated with it. A group is simply a list of users. The Group permissions apply to all users in the associated group. You may do this, for instance, for all users who are working on a particular project. Finally, World permissions apply to anyone who has access to the system. Because these permissions give access to the largest number of users, they are usually the most restricted. Using this technique, the permissions on a file can be shown in a 3 Љ 3 grid: Read Write/Delete Execute Owner Yes Yes No Group Yes No No World No No No Suppose that this grid represents the permissions on a data file used in project Alpha. The owner of the file (perhaps the manager of the project) may read from or write to the file. Suppose also that the owner sets up a group (using the operating system) called TeamAlpha, which contains all members of the project team, and associates that group with this data file. The members of the group may read the data in the file, but may not change it. No one else is given any permission to access the file. Note that no user is given execution privileges to the file because it is a data file, not an executable program.
  9. 9. 11.2 Directories 357 Other operating systems break down their protection schemes in different ways, but the goal is the same: to control access to protect against deliberate attempts to gain inappropriate access, as well as minimize inadvertent problems caused by well-intentioned but hazardous users. 11.2 Directories We established early in this chapter that a directory is a named collection of files. It is a way to group files so that you can organize them in a logical manner. For example, you may group all of your papers and notes for a particular class into a directory created for that class. The operating system must carefully keep track of directories and the files they contain. A directory, in most operating systems, is represented as a file. The directory file contains data about the other files in the directory. For any given file, the directory contains the file name, the file type, the address on disk where the file is stored, and the current size of the file. The directory also contains information about the protections set up for the file. It may also contain information about when the file was created and when it was last modified. The internal structure of a directory file could be set up in a variety of ways, and we won’t explore those details here. However, once it is set up, it must be able to support the common operations that are performed on directory files. For instance, the user must be able to list all of the files in the directory. Other common operations are create, delete, and rename files within a directory. Furthermore, the directory is commonly searched to see if a particular file is in the directory. Another key issue when it comes to directory management is the need to reflect the relationships among directories, as discussed in the next section. Directory Trees A directory of files can be contained within another directory. The directory containing another is usually called the parent directory, and the one inside is called a subdirectory. You can set up such nested directories as often as needed to help organize the file system. One directory can contain many subdirectories. Furthermore, subdirectories can contain their own subdirectories, creating a hierarchy structure. Therefore, a file system is often viewed as a directory tree, showing directories and files within other directories. The directory at the highest level is called the root directory. For example, consider the directory tree shown in Figure 11.4. This tree represents a very small part of a file system that might be found on a Directory tree A structure showing the nested directory organization of the file system Root directory The topmost directory, in which all others are contained
  10. 10. Chapter 11 358 File Systems and Directories C: WINDOWS My Documents Program Files directions.txt martin.doc landscape.jpg util.zip calc.exe Drivers E55IC.ICM ATNS2XX.DL MS Office PowerPnt.exe WinWord.exe downloads brooks.mp3 util.zip System WinZip WinZip32.exe whatsnew.txt letters 3dMaze.scr adobep4.hlp cancelMag.doc john.doc QuickTime applications QTEffects.qtx QTImage.qtx vaTech.doc mit.doc calState.doc csc101 proj1.java proj2.java proj3.java Figure 11.4 A Windows directory tree computer using some flavor of the Microsoft Windows operating system. The root of the directory system is referred to using the drive letter X= followed by the backslash ( ). In this directory tree, the root directory contains three subdirectories: "—elo"E, <> l!O“Fz 9 , and h”! ”,F DYz . Within the "—elo"E directory, there is a file called O,YOiz-z as well as two other subdirectories (l”DSz” and E> 9zF). Those directories contain other files and subdirectories. Keep in mind that all of these directories in a real system would typically contain many more subdirectories and files.
  11. 11. 11.2 Directories Personal computers often use an analogy of folders to represent the directory structure, which promotes the idea of containment (folders inside other folders, with some folders ultimately containing documents or other data). The icon used to show a directory in the graphical interface of an operating system is often a graphic of a manila file folder such as the kind you would use in a physical file drawer. Note that there are two files with the name util.zip in Figure 11.4 (in the <> l!O“Fz 9 directory, and in its subdirectory called T! Y!,T ). The nested directory structure allows for multiple files to have the same name. All the files in any one directory must have unique names, but files in different directories or subdirectories can have the same name. These files may or may not contain the same data; all we know is that they have the same name. At any point in time, you can be thought of as working in a particular location (that is, a particular subdirectory) of the file system. This subdirectory is referred to as the current working directory. As you “move” around in the file system, the current working directory changes. The directory tree shown in Figure 11.5 is representative of one from a Unix file system. Compare and contrast it to the one in Figure 11.4. They both show the same concepts of subdirectory containment. However, the naming conventions for files and directories are different. Unix was developed as a programming and system level environment, and therefore uses much more abbreviated and cryptic names for directories and files. Also, note that in a Unix environment, the root is designated using a forward slash ( P ). Path Names How do we specify one particular file or subdirectory? Well, there are several ways to do it. If you are working in a graphical interface to the operating system, you can double-click with your mouse to open a directory and see its contents. The active directory window shows the contents of the current working directory. You can continue “moving” through the file system using mouse clicks, changing the current working directory, until you find the desired file or directory. To move up the directory structure, there is usually an icon on the window bar or a pop-up menu option that you can use to move to the parent directory. Operating systems usually also provide a nongraphical (text-based) interface to the operating system. Therefore, we also have to be able to specify file locations using text. This is very important for system instructions stored in operating system batch command files. Commands such as OT (which stands for change directory) can be used in text mode to change the current working directory. 359 Working directory The currently active subdirectory
  12. 12. Chapter 11 360 File Systems and Directories / bin etc cat grep Is tar home dev localtime profile named.conf ttyE71 ttyE72 sdn10 sdn11 usr smith man exit.1.gz is.1.gz tail.1.gz week1.txt week2.txt week3.txt sysconfig man2 man2 clock keyboard wait.2.gz unmask.2.gz socket.2.gz wait.2.gz umask.2.gz socket.2.gz mail access domaintable access.old man1 reports jones local donations to do.txt schedule utilities bin nslookup host games printall combine sort2 Figure 11.5 fortune zork A Unix directory tree Path A text designation of the location of a file or subdirectory in a file system Absolute path A path that begins at the root and includes all successive subdirectories Relative path A path that begins at the current working directory To indicate a particular file using text, we specify that file’s path, which is the series of directories through which you must go to find the file. A path may be absolute or relative. An absolute path name begins at the root and specifies each step down the tree until it reaches the desired file or directory. A relative path name begins from the current working directory. Let’s look at examples of each type of path. The following are absolute path names based on the directory tree shown in Figure 11.4: X=h”! ”,F DYz <E o""DOz"D "!”Tiz-z X=<> l!O“Fz 9 Yz99z” ,IIYDO,9D! S,LzO‘iT!O X="D T! E> 9zFQ“DO’LDFz They each begin at the root and proceed down the directory structure. Each subdirectory is separated by the backslash ( ). Note that a path can
  13. 13. 11.2 Directories specify a specific document (as it does in the first two examples) or an entire subdirectory (as it does in the third example). Absolute paths in a Unix system work the same way, except that the character used to separate subdirectories is the forward slash ( P ). Here are some examples of absolute path names that correspond to the directory tree in Figure 11.5: P(D P9,” Pz9OP > O! "D POY!O’ P“ ”PY!O,YP ,Fz P"!”9“ z P‘!FzP FD9‘P”zI!”9 P zz’)i9-9 Relative paths are based on the current working directory. That is, they are relative to your current position (hence the name). Suppose the current working directory is X=<> l!O“Fz 9 Yz99z” (from Figure 11.4). Then the following relative path names could be used: O, OzY<, iT!O ,IIYDO,9D! O,YE9,9ziT!O The first example just specifies the name of the file, which can be found in the current working directory. The second example specifies a file in the applications subdirectory. By definition, the first part of any valid relative path is located in the working directory. Sometimes when using relative path we need to work our way back up the tree. Note that this was not an issue when using absolute paths. In most operating systems, two dots (..) are used to specify the parent directory (a single dot is used to specify the current working directory). Therefore, if the working directory is X=<> l!O“Fz 9 Yz99z” , the following are also valid relative paths: iiY, T O,Izi*I iiO O)+)I”!*Ni*,S, iiii"—elo"El”DSz” W..—Xi—X< iiiih”! ”,F DYz "D JDI Unix systems work essentially the same way. Using the directory tree in Figure 11.5, and assuming that the current working directory is P‘!FzP*! z , the following are valid relative paths: “9DYD9Dz PO!F(D z iiP FD9‘P”zI!”9 iiPiiPTzSP99>W0) iiPiiP“ ”PF, PF, )PY i)i 1 361
  14. 14. John Backus John Backus was an aimless young man who pulled his act together and won the Turing Award. Born in 1924 into a wealthy Philadelphia family, he attended the prestigious Hill School in Pottstown, Pennsylvania, where he repeatedly flunked out and had to attend summer school in order to continue. Finally graduating in 1942, he enrolled in and flunked out of the University of Virginia. In 1943 Backus joined the Army. After his first aptitude test the Army enrolled him in a pre-engineering program at the University of Pittsburgh. Another aptitude test sent him to Havorford College to study medicine. As part of the premed program, he worked in a neurosurgery ward at an Atlantic City hospital. While there he was diagnosed with a brain tumor and a plate was installed in his head. After nine months of medical school, he decided that medicine wasn’t for him, after all. He was at loose ends in 1946, after leaving the army and having an additional operation to replace the plate in his head. When he couldn’t find the hi-fi set he wanted, he enrolled in a radio technicians’ school, where he said that he found his first good teacher. His work with this teacher uncovered his latent interest in mathematics. In 1949, he graduated from Columbia University with a degree in mathematics. As the result of a casual remark to a guide while touring the IBM Computer Center on Madison Avenue, Backus got a job working with IBM’s Selective Sequence Electronic Calculator. In 1953 he wrote a memo outlining the design of a new programming language for IBM’s new 704 computer. The 704 had floating point hardware and an index register, which made it faster, but the software available didn’t make use of these new features. He wanted to design not only a better language, but one that would be easier 362 for programmers to use. His proposal was accepted and a team was put together. The language design was the easy part. The hard part was the compiler that translated the statements into binary. The project that was estimated to take six months took two years. They called the language FORTRAN for formula translating system. The completed compiler consisted of 25,000 lines of machine code. FORTRAN has gone through many transformations during the years but is still the most popular language for scientists and engineers today. Backus went on to develop a notation called the Backus-Naur Form, which is used to describe formally grammatical rules for high-level languages. Undoubtedly his interest in this subject was born when trying to describe the rules of FORTRAN in English. His notation was originally called Backus Normal Form and introduced during the specification of ALGOL 60. Peter Naur, a Danish scientist on the ALGOL 60 committee, made some modifications to the notation, and so it became known as Backus-Naur Form. In the 1970s Backus worked on finding better programming methods. Toward this end, he developed the functional language FP. He is unique in that he developed languages in two paradigms before the word paradigm was even used in relation to programming languages. FORTRAN is an imperative language; FP is a functional language. The citation for John Backus’ Turing Award reads: For profound, influential, and lasting contributions to the design of practical high-level programming systems, notably through his work on FORTRAN, and for seminal publication of formal procedures for the specification of programming languages.
  15. 15. 11.3 Disk Scheduling 363 Most operating systems allow the user to specify a set of paths that are searched (in a specific order) to help resolve references to executable programs. Often that set of paths is specified using an operating system variable called h2L3, which holds a string that contains several absolute paths. Suppose, for instance, that user *! z (from Figure 11.5) has a set of utility programs that he uses from time to time. They are stored in the directory P‘!FzP*! z P“9DYD9Dz . When that path is added to the h2L3 variable, it becomes a standard location used to find programs that *! z attempts to execute. Therefore, no matter what the current working directory is, when *! z executes the I”D 9,YY program (just the name by itself), it is found in his utilities directory. 11.3 Disk Scheduling The most important hardware device used as secondary memory is the magnetic disk drive. File systems stored on these drives must be accessed in an efficient manner. It turns out that transferring data to and from secondary memory is the worst bottleneck in a general computer system. Recall from Chapter 10 the discussion that the speed of the CPU and the speed of main memory are much faster than the speed of data transfer to and from secondary memory such as a magnetic disk. That’s why a process that must perform I/O to disk is made to wait while that information is transferred, to give another process a chance to use the CPU. Because secondary I/O is the slowest aspect of a general computer system, the techniques for accessing information on a disk drive are of crucial importance to our discussion of file systems. As a computer deals with multiple processes over a period of time, a list of requests to access the disk builds up. The technique that the operating system uses to determine which requests to satisfy first is called disk scheduling. We examine several specific disk-scheduling algorithms in this section. Recall from Chapter 5 that a magnetic disk drive is organized as a stack of platters, where each platter is divided into tracks, and each track into sectors. The set of corresponding tracks on all platters is called a cylinder. Figure 11.6 reprints the figure of a disk drive used in Chapter 5 to remind you of this organization. Of primary importance to us in this discussion is the fact that the set of read/write heads hovers over a particular cylinder along all platters at any given point in time. Remember, the seek time is the time it takes for the heads to reach the appropriate cylinder. The latency is the additional time it takes the platter to rotate into the proper position so that the information can be read or written. Seek time is the more restrictive of these two, and therefore is the primary issue dealt with by the disk-scheduling algorithms. Disk scheduling The act of deciding which outstanding requests for disk I/O to satisfy first
  16. 16. Chapter 11 364 File Systems and Directories Read / write head Arm Spindle Block Cylinder Track Sector (a) A hard disc drive Figure 11.6 (b) A single disk A magnetic disk drive At any point in time, a disk drive may have a set of outstanding requests that must be satisfied. For now, we consider only the cylinder (the parallel concentric circles) to which the requests refer. A disk may have thousands of cylinders. To keep things simple, let’s also assume a range of 100 cylinders. Suppose at a particular time the following cylinder requests have been made, in this order: 49, 91, 22, 61, 7, 62, 33, 35 Suppose also, that the read/write heads are currently at cylinder 26. The question is now: To which cylinder should the disk heads move next? Different algorithms produce different answers to that question. First-Come, First-Served Disk Scheduling In Chapter 10 we examined a CPU scheduling algorithm called first-come, first-served (FCFS). An analogous algorithm can be used for disk scheduling. It is one of the easiest to implement, though not usually the most efficient.
  17. 17. 11.3 Disk Scheduling In FCFS, we process the requests in the order they arrive, without regard to the current position of the heads. Therefore, under a FCFS algorithm, the heads move from cylinder 26 (its current position) to cylinder 49. After the request for cylinder 49 is satisfied (that is, the information is read or written), the heads move from 49 to 91. After processing the request at 91, the heads move to cylinder 22. Processing continues like this, in the order that the requests were received. Note that at one point the heads move from cylinder 91 all the way back to cylinder 22, during which they pass over several cylinders whose requests are currently pending. Shortest-Seek-Time-First Disk Scheduling The shortest-seek-time-first (SSTF) disk-scheduling algorithm moves the heads the minimum amount it can to satisfy any pending request. This approach could potentially result in the heads changing directions after each request is satisfied. Let’s process our hypothetical situation using this algorithm. From our starting point at cylinder 26, the closest cylinder among all pending requests is 22. So, ignoring the order in which the requests came, the heads are moved to cylinder 22 to satisfy that request. From 22, the closest request is for cylinder 33, so the heads move there. The closest unsatisfied request to 33 is at cylinder 35. The distance to cylinder 49 is now the smallest, so the heads move there next. Continuing that approach, the rest of the cylinders are visited in the following order: 49, 61, 62, 91, and finally 7. This approach does not guarantee the smallest overall head movement, but it is generally an improvement over the FCFS algorithm. However, a major problem can arise with this approach. Suppose requests for cylinders continue to build up while existing ones are being satisfied. And suppose those new requests are always closer to the current position than an earlier request. It is theoretically possible that the early request never gets processed because requests keep arriving that take priority. This phenomenon is called starvation. First-come, first-served disk scheduling cannot suffer from starvation. SCAN Disk Scheduling A classic example of algorithm analysis in computing comes from the way an elevator is designed to visit floors that have people waiting. In general, an elevator moves from one extreme to the other (say, the top of the building to the bottom), servicing requests as appropriate. Then it travels from the bottom to the top, servicing those requests. 365
  18. 18. 366 Chapter 11 File Systems and Directories The SCAN disk-scheduling algorithm works in a similar way, except instead of moving up and down, the read/write heads move in toward the spindle, then out toward the platter edge, then back toward the spindle, and so forth. Let’s perform this algorithm on our set of requests. Unlike the other approaches, though, we need to decide which way the heads are moving initially. Let’s assume they are moving toward the lower cylinder values (and are currently at cylinder 26). As the read/write heads move from cylinder 26 toward cylinder 1, they satisfy the requests at cylinders 22 and 7 (in that order). After reaching cylinder 1, the heads reverse direction and move all the way out to the other extreme. Along the way, they satisfy the following requests, in order: 33, 35, 49, 61, 62, and 91. Note that new requests are not given any special treatment. They may or may not be serviced before earlier requests. It depends on the current location of the heads and direction in which they are moving. If the new request arrives just before the heads reach that cylinder, it is processed right away. If it arrives just after the heads move past that cylinder, it must wait for the heads to return. There is no chance for starvation because each cylinder is processed in turn. Some variations on this algorithm can improve performance in various ways. Note that a request at the edge of the platter may have to wait for the heads to move almost all the way to the spindle and all the way back. To improve the average wait time, the Circular SCAN algorithm treats the disk as if it were a ring and not a disk. That is, when it reaches one extreme, the heads return all the way to the other extreme without processing requests. Another variation is to minimize the extreme movements at the spindle and at the edge of the platter. Instead of going to the edge, the heads only move as far out (or in) as the outermost (or innermost) request. Before moving onto the next request, the list of pending requests is examined to see whether movement in the current direction is warranted. This variation is referred to as the LOOK disk-scheduling algorithm, because it looks ahead to see whether the heads should continue in the current direction. Summary A file system defines the way our secondary memory is organized. A file is a named collection of data with a particular internal structure. Text files are organized as a stream of characters, and binary files have a particular format that is meaningful only to applications set up to handle that format. File types are often indicated by the file extension of the file name. The operating system has a list of recognized file types so that it may open them in the correct kind of application and display the appropriate icons in
  19. 19. Ethical Issues 367 the graphical interface. The file extension can be associated with any particular kind of application that the user chooses. The operations performed on files include creating, deleting, opening, and closing files. Of course, they must be able to be read from and written to. The operating system provides mechanisms to accomplish the various file operations. In a multi-user system, the operating system must also provide file protection to ensure the proper access. Directories are used to organize files on disk. They can be nested to form hierarchical tree structures. Path names that specify the location of a particular file or directory can be absolute, originating at the root of the directory tree, or relative, originating at the current working directory. Disk-scheduling algorithms determine the order in which pending disk requests are processed. First-come, first-served disk scheduling takes all requests in order, but is not very efficient. Shortest-seek-time-first disk scheduling is more efficient, but could suffer from starvation. SCAN disk scheduling employs the same strategy as an elevator, sweeping from one end of the disk to the other. WWW Computer Viruses and Denial of Service Receiving a love letter in the spring of 2000 left most romantics with intact hearts but damaged computers. The “Love Bug” computer virus, one of the worst infections to date, caused an estimated 10 billion dollars worth of damage as it ravaged through computer systems in 20 countries. The seemingly innocent e-mail entitled “ILOVEYOU” with an attachment called “LOVELETTER” landed in the mailboxes of many unsuspecting users who opened the attachment and thereby released the virus into their computer system. Fittingly, this type of computer virus is termed a “Trojan Horse.” As in the legend of Troy, when Odysseus secretly led Greek soldiers into Troy by hiding them in a wooden horse that the Trojans believed was a gift, these computer viruses appear harmless but wreak havoc. When executed, a virus sweeps through files, modifying or erasing them; it usually also sends itself to the e-mail addresses it accesses. Disguised as desirable downloads like games or screensavers, Trojan Horse viruses can spread rapidly, replicate, and cause significant damage across the globe. Since 1981 when the first wave of computer viruses entered the public sphere and attacked Apple II operating systems, computer viruses have threatened the integrity of computer systems.
  20. 20. 368 Chapter 11 File Systems and Directories Denial of Service (DoS) attacks are not viruses but are a method hackers use to deprive the user or organization of services. DoS attacks usually just flood the server’s resources, making the system unusable. Society views these computer viruses as serious offenses, and people who launch DoS attacks face federal criminal charges. In the 2000 attack on Yahoo, for example, the server was flooded with requests that lacked verifiable return addresses. When the server could not confirm the fake addresses it waited for a few moments; then when it finally denied the request, it was loaded with more requests that had fake return addresses—which tied up the server indefinitely. A DoS attack uses the inherent limitations of networking to its advantage, and, in this case, it successfully brought the site down. The reality of these attacks highlights the need to reevaluate security for both personal computers and the Internet. Scanning for viruses, taking proper precautions when downloading material, and investigating attachments before opening them are useful ways to protect your computer. Internet Service Providers (ISPs) are often proactive in their attempt to prevent viruses and DoS attacks and install firewalls that foster security. Although no system is impenetrable, steps can be taken to improve the security of computer systems and networks. Key Terms Absolute path pg. 362 File System pg. 352 Binary file pg. 353 File type pg. 354 Direct file access pg. 357 Path pg. 362 Directory pg. 352 Relative path pg. 362 Directory tree pg. 359 Root directory pg. 359 Disk scheduling pg. 365 Sequential file access pg. 357 File pg. 352 Text file pg. 353 File extension pg. 354 Working directory pg. 361 Exercises 1. What is a file? 2. Distinguish between a file and a directory. 3. Distinguish between a file and a file system. 4. Why is a file a generic concept and not a technical one? 5. Name and describe the two basic classifications of files.
  21. 21. Exercises 6. Why is the term binary file a misnomer? 7. Distinguish between a file type and a file extension. 8. What would happen if you give the name “myFile.jpg” to a text file? 9. How can an operating system make use of the file types that it recognizes? 10. How does an operating system keep track of secondary memory? 11. What does it mean to open and close a file? 12. What does it mean to truncate a file? 13. Compare and contrast sequential and direct file access. 14. File access is independent of any physical medium. a. How could you implement sequential access on a disk? b. How could you implement direct access on a magnetic tape? 15. What is a file protection mechanism? 16. How does Unix implement file protection? 17. Given the following file permission, answer these questions. Read Write/Delete Execute Owner Yes Yes Yes Group Yes Yes No World Yes No No Who can read the file? Who can write or delete the file? c. Who can execute the file? d. What do you know about the content of the file? a. b. 18. What is the minimum amount of information a directory must contain about each file? 19. How do most operating systems represent a directory? 20. Answer the following questions about directories. a. A directory that contains another directory is called what? b. A directory contained within another directory is called what? c. The directory that is not contained in any other directory is called what? d. The structure showing the nested directory organization is called what? e. Relate the structure in (d) to the binary tree data structure examined in Chapter 9. 369
  22. 22. 370 Chapter 11 File Systems and Directories 21. What is the directory called in which you are working at any one moment? 22. What is a path? 23. Distinguish between an absolute path and a relative path. 24. Show the absolute path to each of the following files or directories using the directory tree shown in Figure 11.4: a. QLW""zO9 i49b. (”!!’ iFI5 c. h”! ”,F DYz d. 5T<,1zi O” e. h! z”h 9iz-z 25. Show the absolute path to each of the following files or directories using the directory tree shown in Figure 11.5. a. 9,” b. ,OOz i!YT ,FzTiO! " d. FD9‘ e. zz’5i9-9 f. I”D 9,YY c. 26. Assuming the current working directory is X="—elo"EE> 9zF, give the relative path name to the following files or directories using the directory tree shown in Figure 11.4. a. QL—F, zi49b. O,YOiz-z c. Yz99z” d. I”!*5i*,S, e. ,T!(zI6i‘YI f. "D "!”Tiz-z 27. Show the relative path to each of the following files or directories using the directory tree shown in Figure 11.5. a. Y!O,Y9DFz when the working directory is the root directory b. Y!O,Y9DFz when the working directory is z9O c. I”D 9,YY when the working directory is “9DYD9Dz d. zz’)i9-9 when the working directory is F, N 28. What is the worst bottleneck in a computer system? 29. Why is disk scheduling concerned more with cylinders than with tracks and sectors? 30. Name and describe three disk-scheduling algorithms.
  23. 23. Thought Questions Use the following list of cylinder requests in Exercises 31 through 33. They are listed in the order in which they were received. 40, 12, 22, 66, 67, 33, 80 31. List the order in which these requests are handled if the FCFS algorithm is used. Assume that the disk is positioned at cylinder 50. 32. List the order in which these requests are handled if the SSTF algorithm is used. Assume that the disk is positioned at cylinder 50. 33. List the order in which these requests are handled if the SCAN algorithm is used. Assume that the disk is positioned at cylinder 50 and the read/write heads are moving toward the higher cylinder numbers. 34. Explain the concept of starvation. ? Thought Questions 1. The concept of a file permeates computing. Would the computer be useful if there were no secondary memory on which to store files? 2. The disk-scheduling algorithms examined in this chapter sound familiar. In what other context have we discussed similar algorithms? How are these similar and how are the different? 3. Are there any analogies between files and directories and file folders and filing cabinets? Clearly the name “file” came from this concept. Where does this analogy hold true and where does it not? 4. Both viruses and denial of services can cause great inconvenience at the least and usually serious monetary damage. How are these problems similar and how are they different. Is one more serious than the other? 5. Have you ever been affected by a virus attack? How much time and/or data did you lose? Do you have a firewall installed in your computer system? 6. Have you ever tried to reach a Web site that was under attack? How many times did you try to access the site? 7. How many times have you seen an article in the paper or on the news about either a DoS or virus in the last week? Month? Year? 371