1. Department of Information Technology 1Data base Technologies (ITB4201)
File Operation
Dr. C.V. Suresh Babu
Professor
Department of IT
Hindustan Institute of Science & Technology
2. Department of Information Technology 2Data base Technologies (ITB4201)
Action Plan
• Overview
• File organisation and Access
• File Directories
• File Sharing
• Record Blocking
• Quiz
3. Department of Information Technology 3Data base Technologies (ITB4201)
Files
• Files are the central element to most applications
– file as an input to applications
– file as an output for long-term storage and for later access
• Desirable properties of files:
– Long-term existence
– Controlled sharing between processes
– Structure that is convenient for particular applications
4. Department of Information Technology 4Data base Technologies (ITB4201)
File Structure
Fields and Records
• Fields
– Basic element of data
• e.g., student’s last name
– Contains a single value
– Characterized by its length and data type
• Records
– Collection of related fields
• e.g., a student record
– Treated as a unit
5. Department of Information Technology 5Data base Technologies (ITB4201)
File Structure
File and Database
• File
– Collection of similar records
– Treated as a single entity and may be referenced by name
– Access control restrictions usually apply at the file level
• Database
– Collection of related data
– Explicit relationships exist among elements
– Consists of one or more files
6. Department of Information Technology 6Data base Technologies (ITB4201)
A Big Picture
How to identify and locate a
selected file?
How to enforce user access
control in shared systems?
How to organize records
as a sequence of blocks
for I/O?
individual block I/O
requests must be
scheduled for optimizing
performanceHow to organize records in a
file and access a particular
record in a file?
7. Department of Information Technology 7Data base Technologies (ITB4201)
Roadmap
• Overview
• File organisation and Access
• File Directories
• File Sharing
• Record Blocking
8. Department of Information Technology 8Data base Technologies (ITB4201)
File Organization
• The basic operations that a user or application may perform on
a file are performed at the record level
– The file is viewed as having some structure that organizes the records
• File organization refers to the logical structuring of records
– Determined by the way in which files are accessed (access method)
9. Department of Information Technology 9Data base Technologies (ITB4201)
Criteria for
File Organization
• Important criteria include:
– Short access time
– Ease of update
– Economy of storage
– Simple maintenance
– Reliability
10. Department of Information Technology 10Data base Technologies (ITB4201)
Criteria for
File Organization
• Priority will differ depending on the use
– For batch mode file processing, rapid access for retrieval of a
single record is of minimal concern
• These criteria may conflict
– Use of indexes (conflict with economy of storage) can be a
primary means of increasing the speed of access to data
11. Department of Information Technology 11Data base Technologies (ITB4201)
The Pile
• Data are collected in the order they arrive
– No structure
• Purpose is to accumulate a mass of data and save it
• Records may have different fields
– field should be self-describing (field name + value)
– field length should be known (delimiters, subfield or
default for a field type)
12. Department of Information Technology 12Data base Technologies (ITB4201)
The Pile
• Record access is by exhaustive search
• Used when data are collected and stored prior to
processing or data are not easy to organize
• Uses space well when data vary in size and structure
• Adequate for exhaustive searches
• Easy to update
• Unsuitable for most applications
13. Department of Information Technology 13Data base Technologies (ITB4201)
The Sequential File
• Fixed format used for records
• Records are of the same length
– same number of fixed-length fields in a particular order
• Only the values of fields need to be stored
• Field name and length are attributes of the file
structure
14. Department of Information Technology 14Data base Technologies (ITB4201)
The Sequential File
• Key field
– Uniquely identifies the record
– Records are stored in key sequence
• Optimal for batch applications if they involve the processing of all
the records
• Easily stored on tape and disk
• Poor performance for interactive applications
– considerable processing and delay due to the sequential search of the file
for a key match
15. Department of Information Technology 15Data base Technologies (ITB4201)
Indexed Sequential File
• An index is added to support random access
– An index record contains a key field and a pointer into
the main file
– The index is a sequential file
– For searching
• Search the index to find the highest key value that is equal to
or precedes the desired key value
• Search continues in the main file at the location indicated by
the pointer
16. Department of Information Technology 16Data base Technologies (ITB4201)
Indexed Sequential File
Example
• Consider searching a particular key value in a sequential file with
1 million records
– without index
• requires on average one-half million record accesses
– with an index containing 1000 entries with the keys in the index evenly
distributed over the main file
• requires on average 500 accesses to the index file + 500 accesses to the main
file
17. Department of Information Technology 17Data base Technologies (ITB4201)
• An overflow file is added
• A new record is added to the overflow file and is located by
following a pointer from its predecessor record
• The indexed sequential file is occasionally merged with the
overflow file in batch mode
• Greatly reduces the time required to access a single
record, without sacrificing the sequential nature.
Indexed Sequential File
18. Department of Information Technology 18Data base Technologies (ITB4201)
Indexed File
• Records are accessed only through their indexes
– no restriction on the placement of records
– allows variable-length records
• Uses multiple indexes for different key fields
– An exhaustive index contains one entry for every
record in the main file
– A partial index contains entries to records where the
field of interest exists
19. Department of Information Technology 19Data base Technologies (ITB4201)
Indexed File
• When a new record is added to the main file, all of the index files
must be updated.
• Used mostly in applications where
– timeliness of information is critical and
– data are rarely processed exhaustively
– examples: airline reservation systems and inventory control systems
20. Department of Information Technology 20Data base Technologies (ITB4201)
Roadmap
• Overview
• File organisation and Access
• File Directories
• File Sharing
• Record Blocking
21. Department of Information Technology 21Data base Technologies (ITB4201)
File Directory
• Contains information about files
– Attributes
– Location
– Ownership
• Directory itself is a file owned by the operating system
22. Department of Information Technology 22Data base Technologies (ITB4201)
Directory Elements
• Basic Information
– File name: must be unique
– File type: e.g., text, binary
– File organization
• Address Information
– Volume: device on which file is stored
– Starting address: e.g., cylinder, track on disk
– Size used: in bytes, words or blocks
– Size allocated: maximum size of the file
23. Department of Information Technology 23Data base Technologies (ITB4201)
Directory Elements
• Access Control Information
– Owner: able to grant/deny access to other users and to change these privileges
– Access information: e.g., user’s name and password for each authorized user
– Permitted actions: controls reading, writing, executing, transmitting over a
network
• Usage Information
– Date Created, Identity of Creator, Date Last Read Access, Identity of Last Reader,
Date Last Modified
24. Department of Information Technology 24Data base Technologies (ITB4201)
Hierarchical, or
Tree-Structured Directory
• Master directory with user directories
underneath it
• Each user directory may have
subdirectories and files as entries
• Each directory and subdirectory can be
organized as a sequential file
25. Department of Information Technology 25Data base Technologies (ITB4201)
Hierarchical, or
Tree-Structured Directory
• Easily enforce access restriction on directories.
• Easily organize collections of files.
• Minimize the difficulty in assigning unique names.
26. Department of Information Technology 26Data base Technologies (ITB4201)
Naming
• The tree structure allows users to find a file by following a path
from the root or master directory down various branches until
the file is reached
• The series of directory names, culminating in the file name
itself, constitutes a pathname for the file
• Duplicate filenames are possible if they have different
pathnames
27. Department of Information Technology 27Data base Technologies (ITB4201)
Naming
• Usually an interactive user or a
process is associated with a current
or working directory
– Files are referenced relative to the
working directory unless an explicit full
pathname is used
28. Department of Information Technology 28Data base Technologies (ITB4201)
Roadmap
• Overview
• File organisation and Access
• File Directories
• File Sharing
• Record Blocking
29. Department of Information Technology 29Data base Technologies (ITB4201)
File Sharing
• In multiuser system, there is almost always a requirement for
allowing files to be shared among a number of users
• Two issues
– Access rights
– Management of simultaneous access
30. Department of Information Technology 30Data base Technologies (ITB4201)
Access Rights
• A wide variety of access rights have been used by various
systems
– often as a hierarchy, with each right implying those that precede it.
• None
– User may not know the existence of file by not allowing to read the
user directory that includes this file
• Knowledge
– User can only determine that the file exists and who its owner is
31. Department of Information Technology 31Data base Technologies (ITB4201)
Access Rights cont…
• Execution
– The user can load and execute a program but cannot copy it, e.g.,
proprietary programs
• Reading
– The user can read the file for any purpose, including copying and
execution
• Appending
– The user can add data to the file but cannot modify or delete any of
the file’s contents
32. Department of Information Technology 32Data base Technologies (ITB4201)
Access Rights cont…
• Updating
– The user can modify, delete, and add to the file’s data.
• Changing protection
– User can change access rights granted to other users
• Deletion
– User can delete the file
33. Department of Information Technology 33Data base Technologies (ITB4201)
User Classes
• Access can be provided to different classes of users
– Owner: usually the files creator, has full rights and may grant rights
to others
– Specific users: individual users who are designated by user ID
– User groups: a set of users identified as a group
– All: all users who have access to this system
34. Department of Information Technology 34Data base Technologies (ITB4201)
Simultaneous Access
• When access is granted to append or update a file to more
than one user, the OS or file management system must enforce
discipline
• User may lock the entire file or individual records during
update
• Mutual exclusion and deadlock are issues for shared access, ref.
readers/writers problem
35. Department of Information Technology 35Data base Technologies (ITB4201)
Roadmap
• Overview
• File organisation and Access
• File Directories
• File Sharing
• Record Blocking
36. Department of Information Technology 36Data base Technologies (ITB4201)
Blocks and records
• Records are the logical unit of access of a structured file
• Blocks are the unit for I/O with secondary storage
• For I/O to be performed, records must be organized as blocks.
• Three methods of blocking are common
– Fixed length blocking
– Variable length spanned blocking
– Variable-length unspanned blocking
37. Department of Information Technology 37Data base Technologies (ITB4201)
Fixed Blocking
• Fixed-length records are used, and an integral number of
records are stored in a block
• Unused space at the end of a block is internal fragmentation
• Common for sequential files with fixed-length records
39. Department of Information Technology 39Data base Technologies (ITB4201)
Variable Length
Spanned Blocking
• Variable-length records are used and are packed into blocks
with no unused space
• Some records may span multiple blocks
– Continuation is indicated by a pointer to the successor block
• Efficient for storage and does not limit the size of records
40. Department of Information Technology 40Data base Technologies (ITB4201)
Variable Blocking: Spanned
• Difficult to implement
• Records that span two blocks require two I/O operations
41. Department of Information Technology 41Data base Technologies (ITB4201)
Variable-length
unspanned blocking
• Uses variable length records without spanning
• Wasted space in most blocks because of the inability to use
the remainder of a block if the next record is larger than the
remaining unused space
• Limits record size to the size of a block
42. Department of Information Technology 42Data base Technologies (ITB4201)
Variable Blocking:
Unspanned
43. Department of Information Technology 43Data base Technologies (ITB4201)
Revisit the Big Picture
Describes the location of all
files plus their attributes
Only authorized users are
allowed to access particular
files in particular ways
Records must be
organized as a sequence
of blocks for output and
unblocked after input
individual block I/O
requests must be
scheduled for optimizing
performance
User views the file as having
some structure that
organizes the records;
different access methods
reflect different file structures
44. Department of Information Technology 44Data base Technologies (ITB4201)
Test Yourself
1. A file is:
a) an abstract data type
b) logical storage unit
c) usually non volatile
d) volatile
2. Large collection of files are called ____________
a) Fields
b) Records
c) Database
d) Sectors
3. A unit of storage that can store one or more records in a hash file organization is denoted as
a) Buckets
b) Disk pages
c) Blocks
d) Nodes
4. How can variable length records arise in a file
a) Storage of multiple record types in a file
b) Record types that allow variable lengths for one or more fields
c) Record types that allow repeating fields, such as arrays or multisets
d) All of the mentioned
5. The slotted page structure is used for _________
a) Organizing records in a block
b) Organizing blocks in a database
c) Deleting records from a block
d) None of the mentioned
45. Department of Information Technology 45Data base Technologies (ITB4201)
Answers
1. A file is:
a) an abstract data type
b) logical storage unit
c) usually non volatile
d) volatile
2. Large collection of files are called ____________
a) Fields
b) Records
c) Database
d) Sectors
3. A unit of storage that can store one or more records in a hash file organization is denoted as
a) Buckets
b) Disk pages
c) Blocks
d) Nodes
4. How can variable length records arise in a file
a) Storage of multiple record types in a file
b) Record types that allow variable lengths for one or more fields
c) Record types that allow repeating fields, such as arrays or multisets
d) All of the mentioned
5. The slotted page structure is used for _________
a) Organizing records in a block
b) Organizing blocks in a database
c) Deleting records from a block
d) None of the mentioned