The document discusses file design and organization in information systems. It describes the key components of files, including data items, records, record keys, and entities. It explains different file organizations like sequential, direct access, indexed, and inverted files. It also discusses designing printed outputs, including determining output objectives, contents, layout, and appropriate output media.
2. FILE DESIGNInformation systems in business are file and
database oriented.
Data are accumulated into files that are processed or
maintained by the system.
The systems analyst is responsible for designing
files, determining their contents and selecting a
method for organising the data.
3. File Components
• Data Item
Individual elements of data are called data items also known as
fields or simply items. For example bank cheque consists of the
following data items ,check number, date, payee, numeric
amount, script amount, note, bank identification, account
number, and signature.
• Record
The complete set of related data pertaining to an entry, such
as a bank cheque is a record Treated as a single unit. The bank
cheque is therefore a record consisting of seven separate fields
related to the payment transaction. Each field has a defined
length and type (alphabetic, alphanumeric, or numeric)
4. File Components (example)
RECORD NAME DATA ITEM NAME TYPE
LENGTH
Bank cheque Cheque Originator c 90
Cheque Number N
6
Date
8
Payee C
24
Amount N
8,2
Bank Number N
9
Account Number N
5. Fixed and variable Length
Records
Fixed length records
When the number and size of data item in a record are
constant for every record, the record is called a fixed length
record. The advantage of fixed-length record is that they are
always of the same size.Thus, the system does not have to
determine how long the record is or where it stops and the next
one begins, thus saving processing time.
Variable-length records
Variable Length records are less common in most business
applications than fixed-length designs because the latter are
easier to manage and meet most application needs. Record size
may vary because the individual data items vary in length
(each record can have a different number of bytes)or because
the number of data items in a record changes from one
occurrence to another.
6. Record Key
• To distinguish one specific record from another,
systems analysts select one data item in the record
that is likely to be unique in all records of a file and
use it for identification purposes.
• This item, called the record key, key attribute, or
simply key, is already part of the record, not
additional data added to it just for the purpose of
identification.
• Common examples of record keys are the part
number in an inventory record, the chart number in
a patient medical record, the student number in a
university record, or the serial number of a
manufactured product. Each of these record keys
has various other uses in the organisation or
7. Entity
• An entity is any person, place, thing, or event
of interest to the organisation and about
which data are captured, stored, or
processed. Patients and tests are entities of
interest in hospitals, while banking entities
include customers and cheques.
8. File and Database
File
A file is a collection of related records. Each record in a
file is included because it pertains to the same entity.
A file of cheques, for example, consists only of
cheques. Inventory records and invoice do not belong
in a cheque file, since they pertain to different
entities.
Databases
A database is an integrated collection of data. Records
for different entities are typically stored in a
database (whereas files store records for a single
entity). In a university database, for example,
records for students, courses, and faculty are
interrelated in the same database.
9. File Organization
Records are stored in files using a file
organisation that determines how the
records will be
• Stored
• Located
• Retrieved
10. Sequential Organization
• Sequential organisation is the simplest way
to store and retrieve records in a file.
• In a sequential file, records are stored one
after the other without concern for the
actual value of the data in the records.
• The first record stored is placed at the
beginning of the file. The second is stored
right after the first ( there are no unused
positions), the third after the second, and so
on. This order never changes in sequential
file organisation, unlike the other
organisations to be discussed
11. Sequential Organization
(Reading)• To read a sequential file, the system always
starts at the beginning of the file and reads its
way up to the record, one record at a time.
For example,
• if a particular record happens to be the tenth
one in a file, the system starts at the first
record and reads ahead one record at a time
until the tenth is reached. It cannot go directly
to the tenth record in a sequential file
without starting from the beginning.
• In fact, the system does not know it is the tenth
record. Depending on the nature of the system
being designed, this feature can be an
12. Sequential Organization
(Searching Record)
• Records are accessed in order of their appearance in the file.
• E.g to find location of cheque 1258 in a sequential file, we will
call the cheque number 1258, the search key.
• The program controls all the processing steps that follow.
• The first record is read and its cheque number compared with
the search key: 1240(Let it be first) versus 1258. Since the
cheque number and search key do not match, the process is
repeated. The cheque number for the next record is 1244, and
it also does not match the search key.
• The process of reading and comparing records continues until
the cheque number and the search key match. If the file does
not contain a cheque numbered 1258, the reading and
comparing process continues until the end of the file is reached.
13. Direct-Access
Organisation
• In contrast to sequential organisation,
processing a direct-access file does not require
the system to start at the first record in the
file.
• Direct-access files are keyed files. They
associate a record with a specific key value
and a particular storage location.
• All records are stored by key at addresses
rather than by position;
• if the program knows the record key, it can
determine the location address of a record and
retrieve it independently of every other record
in the file.
14. Direct-Access
Organisation
(Direct Addressing)
• In the cheque example, the direct access of records
is demonstrated by using a storage area that has a
space reserved for every cheque number from 1240
to 1300.
• The system uses the cheque number as a physical
record key.
• Cheque number 1248 is stored at address 1248,
the location reserved for the cheque with that
number.
• To retrieve that cheque from storage in a computer
system, the program is instructed to use the number
1248 as the search key.
15. Direct-Access
Organisation
(Direct Addressing)
• It knows that the key serves as the address
and thus goes directly to the assigned
location for the record with the key of 1248
and retrieves the record.
• The attractive feature of direct organisation
is that records are retrieved much more
quickly than when the file must be searched
from the beginning.
• When storage is assigned for the file, it starts
at the lowest key value and extends to the
highest key value.
16. Direct Access Organization
(Drawbacks-Direct
Accessing)
• Storage must be allocated even though it will
go unused.
• Another problem prohibiting use of direct
addressing arises when the keys for the
records do not match storage addresses.
Even if the analyst wants to use direct
addressing, it is impossible to do so if key
values and addresses do not correspond. For
example, if keys contain characters (e.g., a key
of AB1CD) in direct addressing is not possible,
since there is no address for AB1CD.
17. Direct Access Organization
(Hash Addressing)
• When direct addressing is not possible but direct access is
necessary, the analyst specifies the alternative access
method of hashing.
• Hashing (also called key transformation or randomising) refers
to the process of deriving a storage address from a record key.
• An algorithm (an arithmetic procedure) is devised to change a
key value into another value that serves as a storage address.
(The data value in the record itself does not change.)
• There is no perfect hashing algorithm, although some are
much better than others when it comes to minimising
synonyms.
• In practice, synonyms occur when the hashing procedure is
applied on different keys and produces the same address in
storage.
18. Direct Access Organization
(Hash Addressing-contd..)
• A separate overflow area is set aside to provide for
record storage when synonyms occur. When a record
is stored, the hashing algorithm is performed and
the address derived.
• The program accesses that storage area, and, if it is
unused, the record is stored there. If there is already
a record stored there, the new record is written in
the overflow area. When the system must retrieve a
record, the hashing algorithm is performed and
the storage address determined. Then the record
in the storage area is checked. If it is not the correct
one (meaning that a synonym occurred earlier), the
system automatically goes to the overflow area and
retrieves the record for processing.
19. Indexed Organisation
• A third way of accessing records is through an
index.
• The basic form of index included a record key
and the storage address for a record.
• To find a record when the storage address is
unknown (as with direct address and
hashing structures), it is necessary to scan
the records. However, the search will be
faster if an index is used, since it takes less
time to search an index than an entire file of
data.
20. Indexed Organisation
(Characteristics)
• An index is a separate file from the master file to which it
pertains. Each record in the index contains only two items of
data: a record key and a storage address.
• To find a specific record when the file is stored under an indexed
organisation, the index is first searched to find the key of the
record wanted. When it is found, the corresponding storage
address is noted and then the program accesses the record
directly.
• This method uses a sequential scan of the index, followed by
direct access to the appropriate record. The index helps speed
the search compared with a sequential file, but it is slower than
direct addressing. When the master file is not in any specific
order , this method of file organisation is indexed non-
sequential organisation. There is one entry in the index for every
record in the master file.
21. Indexed Sequential
Organisation
• The one most widely used in information systems, creates a
pseudo sequential file. Groups of records are stored in blocks
with a capacity for a specified amount of data.
• For example, the blocks can store up to 3150 pieces of data.
The first block, starting at address 1345, is in sequential order.
• The master file stores individual blocks of records in sequential
order. This is not a sequential file, however, since all the records
are not stored in physically adjacent positions; think of it as a
file of separate, full or partially full blocks, each in sequential
order.
• The adjacent blocks are not in ascending order. For example,
to pursue a logical ascending sequence, the record following
1115 at the end of the first block is in the block at address 1349.
24. Inverted File
• The other type of data structure commonly used in
database management systems is an inverted file.
• This approach uses an index to store information
about the location of records having particular
attributes.
• In a fully inverted file, there is one index for each type
of data item in the data set . Each record in the index
contains the storage address of each record in the file
that meets the attribute.
• Some data items in a database will probably never be
used to retrieve data. Therefore, no index will be
built for those data items. If not all attributes are
indexed, the database is only partially inverted,
which is more common data structure.
25. OUTPUT DESIGN
• One of the most important features of an
information system for users is the output it
produces.
• Outputs from computer systems are required
primarily to communicate the results of
processing to users.
• Without quality output, the entire system may
appear to be so unnecessary that users will
avoid using it, possibly causing it to fail.
• The term output applies to any information
produced by an information system
26. Output Objectives
• Convey information about past activities,
current status or projections of the future e.g.
- a report on stock in hand shows current
status, exception report e.g. for electricity billing
number of houses locked in a area.
• Signal important events, opportunities
problems or warnings
• Trigger an action e.g. reorder level report
whether printed or displayed.
• Confirm an action e.g. report of goods received
27. Key Output Questions
• Who will receive the output ?
• What is its planned use ?
• How much detail is needed ?
• When and how often is the output needed ?
• By What Method ?
28. Contents of the Outputs
Data Items
The name of each data item along with its
characteristics should be recorded in a standard form: -
• Whether it is alphabetic or numeric Valid and specific
range of values e.g. minimum, maximum fixed values
or ranges.
• Size of data item
• Position of decimal point, arithmetic sign or any other
indicator
The objective is to present the same data item being
referred to by various names or the same name
being used to describe different items
29. Contents of the Outputs
(Contd..)
Data Totals
There is often a need to provide totals at
various levels. Their source must be identified
and they must be defined and registered as
data items. The systems analyst must specify :-
• At what level(s) they are required e.g.
subtotal, grand total.
• The position e.g. at the end of line.
• What will cause them to occur e.g. change of
key or any other condition
30. Contents of the Outputs
(Contd..)Data Editing
It is not always desirable to print or display
data as it is held on a computer. The systems
analyst must know whether the form in which
it is stored is suitable for the output. So if any
editing is required he must specify it e.g.
• Decimal points to be inserted or not.
• Where the currency symbol should appear as
prefix or suffix.
• Alignment of items e.g., right, left.
31. Contents of the Outputs
(Contd..)
Output Media
Systems analyst also has to determine the most
appropriate medium for the outputs. This will
involve consideration of wide range of devices
including
• Line Printer
• Graph plotter
• V D U
• Magnetic Media
• Microfilm
32. Contents of the Outputs (Contd..)
Considerations while selecting Media
• Suitability of the device to the particular
application.
• The need for hard copy and number of copies
required.
• The response time required.
• The location of users
• The S/W and H/W available.
• The cost.
33. Developing A Printed Output Layout
• The design of printed output will determine
its usefulness to the recipient.
• An output layout is the arrangement of
items on the output medium. When
analysts design an output layout, they are
building a mock up of the actual report or
document as it will appear after the system
is in operation.
34. Developing A Printed Output Layout
(Contd..)
The layout should show the location and
position of the following.
• All variable information
• Item details
• Summaries and totals
• Separators e.g. dash & underline, control breaks
• All pre-printed details
• Headings
• Document name
• Organisation name and address
• Instructions
• Notes & comments
35. Developing A Printed Output Layout
(Contd..)
Common notations used in designing an
output layout :-
• Variable information
• X to denote that an alphabet or special
character *,/ will be printed or displayed.
• 9 to denote a number will be printed.
• Constant information
The information written on the form as it
should appear when printed.
36. Designing Printed Output
• Headings
In every report- title of the report, date and
time should be included to tell the users what
they are working with and on what date it was
prepared. The page number provides quick
reference for the users who work with data
found at various locations throughout the
report.
37. Designing Printed Output
(Contd..)
• Column Headings
Before actually marking in the data fields,
enter the column headings. It is a good
practice to use an underline, dash or some
other symbol to separate the column
headings from the start of data. Every
column should have a heading that describe
its contents.
38. Designing Printed Output
(Contd..)
• Data & Details
Enter the description of the data below the
column headings, using the X and 9
conventions explained earlier and indicate
size of data item.
• Summaries
Some report designs specify summary
information, column totals or subtotals. Label
all titles and headings as you wish them to
appear, denote variable data by X or 9 and
indicate the maximum length of the field.
39. Guidelines for Report
Design
(Summary)• Reports and documents should be designed to
read from left to right and top to bottom.
• The most important items should be easiest to
find e.g. in an inventory report Item Number
is the most important item. It is placed in the
first column.
• All pages should have a title and page number
and show the date the output was prepared.
• All columns should be labelled.
• Abbreviations should be avoided.
40. INPUT DESIGN
Introduction
Input Specification describes the manner in
which data enter the systems for processing.
Input design features can ensure reliability of
system and produce results from accurate
data. The input design also determines
whether the user can interact efficiently with
the system.
41. Objectives of Input Design
• Controlling Amount Of Input
Data preparation and data entry operations
depend on people. Because labour costs are
high, the cost of preparing and entering data
is high, so reducing data requirements can
lower costs.
The computer may sit idle while data are
being prepared & input for processing. By
reducing input requirements, the analyst can
speed the entire process from data capture
to processing.
42. Objectives of Input Design
(Contd..)
• Avoiding Delay
Avoiding processing delays resulting from
data preparation or data entry operations
should be one of the objectives of the analyst
in designing input.
• Avoiding Errors In Data
The rate at which errors occur depends on the
quantity of data, since the smaller the amount
of data fewer the opportunities for errors.
The analyst can reduce the number of errors
by reducing the volume of data that must be
entered for each transaction.
43. Objectives of Input Design
(Contd..)• Avoiding Extra Steps
When the volume of transactions can't be
reduced, the analyst must be sure the process
is as efficient as possible. Such input designs
that cause extra steps should be avoided.
• Keeping The Process Simple
There should not be so many controls on
errors that people will have difficulty using
the system. The system should be such that it
is comfortable to use while providing the error
control methods.