Field -- group of related bytes that can be identified by user with name, type, and size.
Record -- group of related fields.
File (flat file) -- group of related records that contains info used by specific application programs to generate reports.
Database -- groups of related files that are interconnected at various levels to give flexible access to users .
Appears to File Manager to be a type of file.
Definitions - 2
Program files contain instructions.
Data files contain data.
Directories -- listings of file names and their attributes.
Every program and data file accessed by computer system, and every piece of computer software, is treated as a file.
File Manager treats all files exactly same way as far as storage is concerned.
Interacting With File Manager
Users communicates with File Manager via specific commands that may be either embedded in user’s program or submitted interactively by user.
OPEN & CLOSE pertain to availability of file for program invoking it.
READ & WRITE are I/O commands.
MODIFY – specialized WRITE command for existing data files that allows for appending/rewriting records.
CREATE & DELETE -- deal with system’s knowledge of file.
SAVE -- first time used, a file is actually created.
OPEN NEW -- within a program indicates file must be created.
OPEN…FOR OUTPUT -- creates file by making entry for it in directory & finding space for it in secondary storage.
RENAME -- allows users to change name of existing file.
COPY – allows user to make duplicate copies of existing files.
Commands Are Device-Independent
Interface commands designed to be as simple as possible to use.
Lack detailed instructions to run device where file is stored.
Device independent .
To access a file, user doesn’t need to know its exact physical location on disk pack or storage medium.
Each logical command broken down into sequence of low-level signals that
Trigger step-by-step actions performed by device.
Supervise progress of operation by testing device’s status.
Typical Volume Configuration
Each secondary storage unit (removable or non-removable) is considered a volume.
Each volume can contain several files called multifile volumes .
Some files are extremely large and are contained in several volumes called multivolume files .
Generally, each volume in system is given name.
File Manager writes name & other descriptive info on easy-to-access place on each unit .
Master File Directory (MFD)
MFD stored immediately after volume descriptor
Lists names & characteristics of every file contained in volume.
File names refer to program files, data files, and/or system files.
Subdirectories, if supported.
Remainder of volume is used for file storage.
Early OS supported only a single directory per volume.
Created by File Manager.
Contains names of files, usually organized in alphabetical, spatial, or chronological order.
Simple to implement and maintain.
Some major disadvantages
Volume Descriptor Creation Date Date when volume was created Pointer to Directory Area Indicates first sector where directory is stored Pointer to File Area Indicates first sector where file is stored File System Code Used to detect volumes with incorrect formats Volume News User-allocated name
Some Major Disadvantages of Single Directory Per Volume
Takes long time to search for an individual file, especially if MFD was organized in an arbitrary order.
If user has many small files stored in volume, directory space fills before disk storage space fills. User told “disk full” when only directory full.
Users can’t create subdirectories to group related files.
Multiple users can’t safeguard files from other users browsing file lists ‘cause entire directory listed on request.
Each program in entire directory needs unique name.
E.g., Only 1 person using directory can name program PROG1.
Semi-sophisticated File Managers create MFD for each volume with entries for files & subdirectories.
Subdirectory created when user opens account to access computer.
MFD entry flagged to indicate subdirectory with unique properties.
Improvement from single directory scheme.
Still can’t group files in a logical order to improve accessibility & efficiency of system.
Subdirectories Can Be Implemented As an Upside-down Tree
Today’s File Managers allow users to create subdirectories so related files are grouped together.
Extension of previous two-level directory structure.
Tree structures allow system to efficiently search individual directories due to fewer entries in each.
Path to requested file may lead through several directories.
When user wants to access specific file, file name is sent to File Manager. File Manager searches MFD for user's directory. Then searches user's directory & any subdirectories for requested file & location.
Each file entry in every directory contains info describing file:
File name—usually represented in ASCII code.
File type—organization and usage that are dependent on system (e.g., Files and directories).
File size—size is kept here for convenience.
File location—identification of first physical block (or all blocks) where file is stored.
Date and time of creation.
Protection information—access restrictions based on who is allowed to access file and what type of access is allowed.
Record size —its fixed size or its maximum size, depending on type of record
Absolute file name (complete file name) – long name that includes all path info.
Relative file name – short name seen in directory listings.
Selected by user when file is created.
E.g., ACCOUNT ADDRESSES, TAXES 2001, or AUTOEXEC.
Extension – 2-3 character name used to identify type of file or its contents.
Separated from relative name by a period.
E.g., CPP, BAS, BAT, COB, & EXE signal to system to use specific compiler or program to run these files.
E.g., TXT, DOC, OUT, MIC, & KEY created by applications or by users for own identification.
File Naming Conventions
Can vary in length from 1 or more characters.
Can include letters of alphabet & digits.
Every OS has specific rules that affect length of relative name & types of characters allowed.
E.g., MS-DOS allows 1-8 alphanumeric character names without spaces.
More modern OS allow names with dozens of characters including spaces.
Try to select descriptive relative names that readily identify file contents/purpose of file.
Base and Current Directories Used by File Manager to Locate Files
File Manager selects base directory for user when interactive session begins.
All file operations requested by that user start here.
Then, user selects subdirectory ( current directory or working directory) .
Thereafter, files presumed to be located in current directory.
Whenever file accessed, user types in relative name & File Manager adds proper prefix.
As long as users refer to files in working directory, can access them without entering complete name.
File Organization : Record Format
Fixed-length records – easiest to access directly.
Most common type & ideal for data files.
Record size critical (too small – truncation; too large – wastes space).
Variable-length records -- difficult to access directly because hard to calculate exactly where record is located.
Don’t leave empty storage space & don’t truncate any characters.
Frequently used in files accessed sequentially (e.g,. text files, program files) or files using index to access records.
File descriptor stores record format, how it’s blocked, & other related info.
Physical File Organization
Concerned with how records are arranged & characteristics of medium used to store it.
On magnetic disks, files can be organized as:
Characteristics Considered When Selecting File Organization
Volatility of data—frequency with which additions & deletions made.
Activity of file—% records processed during a given run.
Size of file.
Response time—amount of time user is willing to wait before requested operation is completed.
Sequential Record Organization
Easiest to implement because records are stored & retrieved serially, one after other.
To speed process some optimization features may be built into system.
E.g., select a key field from record & then sort records by that field before storing them.
Aids search process.
Complicates maintenance algorithms because original order must be preserved every time records added or deleted.
Direct Record Organization (Random Organization)
Uses direct access files which can be implemented only on direct access storage devices.
Give users flexibility of accessing any record in any order without having to begin search from beginning of file.
Records are identified by their relative addresses (their addresses relative to beginning of file).
Logical addresses computed when records are stored & again when records are retrieved.
Use hashing algorithms .
Advantages of Direct Access Organization
Fast access to records.
Can be accessed sequentially by starting at first relative address & incrementing it by one to get to next record.
Can be updated more quickly than sequential files because records quickly rewritten to original addresses after modifications.
No need to preserve order of the records, so adding or deleting them takes very little time.
Collisions Are a Problem With Direct Access Organization
Several records with unique keys may generate same logical address ( collision ).
Program generates another logical address before presenting it to File Manager for storage.
Colliding records stored in overflow area via links.
File Manager handles physical allocation of space.
Maximum file size established when created & eventually file is full or too many records are stored in overflow area.
Programmer must reorganize & rewrite file.
Indexed Sequential Record Organization
Combines best of sequential & direct access.
Created & maintained through Indexed Sequential Access Method (ISAM) software package.
Doesn’t create collisions because it doesn’t use result of hashing algorithm to generate a record’s address.
Uses info to generate index file through which records retrieved.
Divides ordered sequential file into blocks of equal size.
Size determined by File Manager to take advantage of physical storage devices & to optimize retrieval strategies.
Each entry in index file contains highest record key & physical location of data block where this record, & records with smaller keys, are stored.
Indexed Sequential - 2
To access any record in file, system begins by searching index file & then goes to physical location indicated at that entry.
Overflow areas are spread throughout file
Existing records can expand & new records are in close physical & logical sequence.
Last-resort overflow area is located apart from main data area but is used only when the other overflow areas are completely filled.
When retrieval time becomes too slow, file has to be reorganized..
Allows both direct access to a few requested records & sequential access to many records for most dynamic files.
A variation of indexed sequential files is B-tree .
Physical Storage Allocation
File Manager must work with files not just as whole units but also as logical units or records.
Records within file must have same format but can vary in length .
Records are subdivided into fields.
Structure usually managed by application programs, not OS.
When we talk about file storage, we’re actually referring to record storage .