Regression analysis: Simple Linear Regression Multiple Linear Regression
Database & dbms
1. 3
DATABASE AND DATABASE
MANAGEMENT SYSTEMS
The database is now the underlying framework of the information
system and has fundamentally changed the way many companies and
individuals work. In today's technology-intensive economy, most
organizations around the world- whether they are for - profit, not -for-profit,
educational or governmental - could not stay competitive or achieve
their goals without database management systems. Databases touch all
aspects of our lives (in fact, the database is now such an integral part of
our day-to-day life that often we're not aware we are using one). Users of
a database store crucial information - from customers names and suppliers
prices to sales history and procurement records - update that information
and make it readily available to whoever needs it. People who work with
databases are responsible for many of the benefits that computers have
offered all kinds of organizations.
These are some examples of Database Applications:
• Banking: all transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Manufacturing: production, inventory, orders, supply chain
• Human resources: employee records, salaries, tax deductions
FILE-BASED APPROACH
In the early days, database applications were built on top of file
systems. That's the traditional way to use computers to back information
systems: to store data in files and process them by means of dedicated
programs. The file is a named collection of data regarding similar entities
(objects, facts, events).
Drawbacks of using file systems to store data:
• Data redundancy and inconsistency: multiple file formats, duplication
of information in different files
• Difficulty in accessing data: need to write a new program to carry out
each new task
• Data isolation : multiple files and formats
2. • Integrity problems : Integrity constraints become part of program
4
code, hard to add new constraints or change existing ones
• Atomicity of updates: failures may leave database in an inconsistent
state with partial updates carried out (E.g. transfer of funds from one
account to another should either complete or not happen at all)
• Concurrent access by multiple users : concurrent accessed needed for
performance, uncontrolled concurrent accesses can lead to
inconsistencies (E.g. two people reading a balance and updating it at
the same time)
• Security problems
These disadvantages were especially pronounced in first and
second-generation application systems. In third-generation systems, a
number of powerful support packages and tools have been introduced to
minimize some of the disadvantages. This software support provides data
dictionaries, high-level programming languages. However, even with
these facilities, there remain the fundamental deficiencies of file
processing systems: redundant data, low sharing of data, lack of standards
and control, and low productivity. To overcome these disadvantages, a
new approach emerges in the years 70, the data base approach discussed in
the following section.
THE DATABASE APPROACH
The database approach represents a different concept in
information resource management. Data are viewed as an important,
shared resource that must be managed like any other asset, such as people,
materials, equipment and money. The data base concept is rooted in an
attitude of sharing common data resources, releasing control of those data
resources to a common responsible authority, and cooperating in the
maintenance of those shared data resources.
The database
A database consists of a shared collection of logically related data
(and a description of this data), designed to meet the information needs
of an organization. The database system provides the organization with
centralized control of its data. Such a situation contrasts sharply with that
found in an enterprise without a database system, where typically each
application has its own private files, so that the data is widely dispersed
and might thus be difficult to control in any systematic way.
3. The database is a single, possibly large repository of data, which
can be used simultaneously by many departments and users. All data that
is required by these users is integrated with the minimum amount of
duplication. And, importantly, the database is normally not owned by any
one department or user but is a shared corporate resource.
Technically, the database approach differs from the file approach
by the following features:
• permanent links are established between files to materialize real life
interactions. From here, the technical definition is derived: Database
is a named collection of interrelated files, shared by more users.
• data descriptions (logical records of each file) are stored on magnetic
support (not in programs but in data files or close). As well as holding
the organization's operational data, the database also hold a description
of this data. For this reason, a database is also defined as a self-describing
collection of integrated records. The description of data
(the meta-data) is known as the system catalog or data dictionary. It is
the self-describing nature of a database that provides what's known as
data independence. This means that the programs consist only on
algorithms that are no more data dependent (we can use the same
program for more data descriptions) and they use the data description
in common.
5
Program Program
Data description Algorithm
Algorithm
Data Description
Data Data
File approach Data base approach
Fig. 2.1 File and database approaches
The concept of centralized control implies there will be some
identifiable person who has this central responsibility for the data. That
person is the data administrator (DA). It is the data administrator’s job to
decide what data should be stored in the database in the first place, and to
establish policies for maintaining and dealing with that data once it has
4. been stored. The technical person responsible for implementing the data
administrator’s decisions is the database administrator (DBA).
6
Data base management system (DBMS)
The DBMS is a software system that enables users to define, create
and maintain the database and also provides controlled access to the
database. So, the DBMS is the software that interacts with the users,
application programs and the database.
A database management system (DBMS) is the software used to
specify the logical organization for a database and access it.
DBMS provides an environment that is both convenient and
efficient to use.
Application programs
Users interact with the database through a number of application
programs that are used to create and maintain the database and to generate
information. These programs can be conventional batch applications or,
more typically nowadays, they will be online applications. The application
programs may be written in some programming language or in some
higher-level fourth-generation language.
An application program is a computer program that interacts
with the database by issuing an appropriate request to the DBMS.
Views
A DBMS provides a facility known as a view mechanism, which
allow each user to have his or her own customized view of the database,
where a view is some subset of the database.
A view is a virtual table that does not necessarily exist in the
database but is generated by the DBMS from the underlying base
tables whenever it's accessed.
A view is usually defined as a query that operates on the base
tables to produce another virtual table. As well as reducing complexity by
letting users see the data in the way they want to see it, views have several
other benefits:
- Views provide a level of security. Views can set up to exclude
data that some users should not see.
- Views provide a mechanism to customize the appearance of the
database.
5. - A view can present a consistent, unchanging picture of the
structure of the database, even if the underlying database is
changed.
Components of the DBMS environment
We can identify five major components in the DBMS environment:
1. Hardware - the computer systems that the DBMS and the application
programs run on. This can change from a single PC, to a single
mainframe, to a network of computers.
2. Software - the DBMS software and the application programs, together
with the operating system, including network software if the DBMS is
being used over a network.
3. Data and data descriptions (meta-data) - the data acts like a bridge
between the hardware and software components and the human
components.
4. Procedures - the instructions and rules that govern the design and use
7
of the database.
5. People - In the database environment, human jobs are more specific:
some users deal only with data retrieving (end users), some with
developing new information system (application programmers) and
some must manage the complex data base environment (data
administrator and database administrator). Also, there are other
specialists, like database designers (software professionals who
specify information, content and create database systems), Web-application
developers (create Web pages and devise means for
processing information content through the Web) or Web-site
designers.
6. 8
.
End Users DBMS
group
FMS
Programmers
Operating System
Database Administrator BIOS
Database
Data dictionary
and directory
Fig 2.2 Software environment in database approach
All data handling (storing, updating or retrieving) is done only by
the DBMS. Some database uses the File Management System to store,
update and retrieve data in files visible from the Operating system (Dbase,
Fox, Paradox). Modern DBMS like Microsoft Access and Oracle don’t use
the File Management System, they had their own routines of storing data
in tables enclosed in a container seen as a unique file by the Operating
System. Sometimes, the DBMS replace totally the FMS and sometimes
the DBMS is embedded in the Operating System.
DBMS architectures
Before the advent of the Web, generally a DBMS would be divided in two
parts:
• a client program that handles the main business and data processing
logic and interfaces with the user;
• a server program (sometimes called the DBMS engine) that manages
and controls access to the database.
This is known as a two-tier client-server architecture.
7. In the mid -1990s, as applications became more complex and potentially
could be developed to hundreds and thousands of end-users, the client side
of this architecture gave rise to two problems:
ü A "fat" client, requiring considerable resources on the client's
9
computer to run effectively (disk space, RAM and CPU power).
ü A significant client-side administration overhead.
By 1995, a new variation of the traditional two-tier client-server model
appeared to solve these problems, called three-tier client-server
architecture. This new architecture proposed three layers, each
potentially running on a different platform:
• The user interface layer, which runs on the end-user's computer (the
client).
• The business logic and data processing layer - a middle layer which
runs on a server and is often called the application server. One
application server is designed to serve multiple clients.
• A DBMS, which stores the data required by the middle layer. This tier
may run on a separate server called the database server.
The three-tier design has many advantages over the traditional two-tier
design, such as:
ü A "thin" client, which requires less expensive hardware.
ü Simplified application maintenance, as a result of centralizing the
business logic for many end-users into a single application server.
ü Added modularity, which makes it easier to modify or replace one tier
without affecting the other tiers.
ü Easier load balancing, as a result of separating the core business logic
from the database functions. A Transaction Processing Monitor (TPM)
-a program that controls data transfer between clients and servers in
order to provide a consistent environment for Online Transaction
Processing) can be used to reduce the number of connections to the
database server.
ü It maps quite naturally to the Web environment, with a Web browser
acting as the "thin" client, and a Web server acting as the application
server.
8. 10
Functions of a DBMS
A good DBMS should furnish a number of capabilities. The list of
features that a DBMS should furnish includes the following:
1. Data storage, retrieval and update: the ability to store, retrieve, and
update the data that is in the database - the fundamental function of a
DBMS. Unless a DBMS provides this facility, further discussion of
what a DBMS can do is irrelevant. In storing, updating, and retrieving
data, it should not be incumbent upon the user to be aware of the
system's internal structures or the procedures used to manipulate these
structures. This manipulation is strictly the responsibility of the DBMS
2. Meta-data storage, retrieval and update: A user-accessible catalog
in which descriptions of data items are stored and which is accessible
to users. A key feature of a DBMS is the provision of an integrated
system catalog to hold data about the structure of the database, users,
applications, and so on. The catalog is expected to be accessible to
users as well as to the DBMS. Typically, the system catalog stores:
- names, types and sizes of data items,
- integrity constraints on the data,
- names of authorized users who have access to data.
3. Transaction support. A transaction can be defined as being an action,
or series of actions, carried out by a single user or application
program, which access or changes the contents of the database. For
example, a simple transaction will be to add a new customer in the
database, to update the price of one product or a more complex one to
delete a sale agent and to reassign his customers to others sales agents.
If the transaction fails during execution, the database should be in
inconsistent state: some changes will have been made and others not.
To overcame this, a DBMS should provide a mechanism that will
ensure either that all the updates corresponding to a given transaction
are made or that none of them are made.
We can use the famous "ACID test" when deciding whether or not a
database management system is adequate for handling transactions. An
adequate system has the following properties:
9. ü Atomicity: results of a transaction's execution are either all committed
11
or all rolled back. All changes take effect, or none do. .
ü Consistency: the database is transformed from one valid state to
another valid state. This defines a transaction as legal only if it obeys
user-defined integrity constraints. Illegal transactions aren't allowed
and, if an integrity constraint can't be satisfied then the transaction is
rolled back.
ü Isolation: the results of a transaction are invisible to other transactions
until the transaction is complete.
ü Durability: once committed (completed), the results of a transaction
are permanent and survive future system and media failures.
4. Concurrency control services (support for shared update): a
mechanism to ensure accuracy when several users are updating the
database at the same time. Concurrent access is relatively easy if all
users are only reading data, as there is no way they can interfere with
one another. When two or more users are accessing the database
simultaneously and at least one of them is updating data, there may be
interference that can result in inconsistencies. One approach that
ensures correct results is locking; as long as a portion of the database is
locked by one user, other users cannot gain access to it.
5. Recovery services: a mechanism for recovering the database in the
event that the database is damaged in any way. This may be the
result of a system crash, media failure, a hardware or software error
causing the DBMS to stop, or it may be the result of the user detecting
an error during the transaction and aborting the transaction before it
completes. In all the cases, the DBMS must provide a mechanism to
recover the database to a consistent state. The simplest approach to
recovery involves periodically making a copy of the database (called a
backup or a save). If a problem occurs, the database is recovered by
copying this backup copy over it. In effect, the damage is undone by
returning the database to the state it was in when the last backup was
made.
6. Security services: a mechanism to ensure that only authorized users
can access the database. A DBMS must furnish a mechanism that
restricts access to the database to authorized users. The term security
10. refers to the protection of the database against unauthorized (or even
illegal) access, either intentional or accidental.
7. Integrity services: mechanisms to ensure that certain rules are
followed with regard to data in the database and any changes that are
made in the data. Data integrity refers to the correctness and
consistency of stored data. It can be considered as another type of
database protection. While it's related to security, it has wider
implications; integrity is concerned with the quality of data itself.
Integrity is usually expressed in terms of constraints, which are
consistency rules that the database is not permitted to violate. The
types of constraints that may be present fall into the following four
categories:
§ Data type. The data entered for any column should be consistent with
the data type for that column. For a numeric column, only numbers
should be allowed to be entered. If the column is a date, only a
legitimate date (in the form MMDDYY or MM/DD/YY) should be
permitted.
§ Legal values. It may be that for certain columns, not every possible
value that is of the right type is legitimate. For example, even though
CREDLIM is a numeric column, only the values 400, 500, 700, 800,
and 1,000 may be valid.
§ Format. It may be that certain columns have a very special format that
12
must be followed.
§ Key constraints. There are two types of key constraints: primary key
constraints and foreign key constraints. Primary key constraints
enforce the uniqueness of the primary key. For example, forbidding
the addition of a customer whose number matched the number of a
customer already in the database would be a primary key constraint.
Foreign key constraints enforce the fact that a value for a foreign key
must match the value of the primary key for some row in another table.
Forbidding the addition of a customer whose sales agent was not
already in the database would be an example of a foreign key
constraint.
An integrity constraint can be treated in one of four ways:
a) The constraint can be ignored, in which case no attempt is made to
enforce the constraint.
b) The burden of enforcing the constraint can be placed on the users of
the system. This means that users must be careful that any changes
they make in the database do not violate the constraint.
11. c) The burden can be placed on programmers. Logic to enforce the
constraint is then built into programs. Users must update the database
only by means of these programs and not through any of the built-in
entry facilities provided by the DBMS, since these would allow
violation of the constraint. The programs are designed to reject any
attempt on the part of the user to update the database in such a way
that the constraint is violated.
d) The burden can be placed on the DBMS. The constraint is specified to
the DBMS, which then rejects any attempt to update the database in
such a way that the constraint is violated.
The best approach is the last one. Unfortunately, most DBMS don't
have all the necessary capabilities to enforce the various types of integrity.
Usually, the approach that is taken is a combination of the (c) and (d) in
the foregoing list. We let the DBMS enforce any of the constraints that it
is capable of enforcing; application programs enforce other constraints.
We might also create a special program whose sole purpose would be to
examine the data in the database to determine whether any constraints had
been violated; this program would be run periodically. Corrective action
could be taken to remedy any violations that were discovered by leans of
this program.
8. Support for data communication. Most users access the database
from terminals. Sometimes, these terminals are connected directly to
the computer hosting the DBMS. In other cases, the terminals are at
remote locations and communicate with the computer hosting the
DBMS over a network. In either case, the DBMS must be capable of
integrating with networking/communication software,
9. Services to promote data independence: facilities to support the
independence of programs from the structure of the database. One of
the advantages of working with a DBMS is data independence; that is,
the property that changes can be made in the layout of a database
without application programs necessarily being affected. Data
independence is normally achieved through a view mechanism; there
are usually several types of changes that can be made to the physical
characteristics of the database without affecting the views, such as
using different file organizations or modifying indexes - this is called
physical data independence. However, complete logical data
independence is more difficult to achieve; the addition of a new file or
13
12. field can usually be accommodated, but not their removal (in some
systems, any type of change to a file structure is prohibited).
10. Utility services: DBMS-provided services that assist in the general
maintenance of the database. Utility programs help the Database
Administrator to manage the database effectively. Following is a list of
such services that may be provided by a DBMS.
• Services that permit changes to be made in the database structure
(adding new tables or columns, deleting existing tables or columns,
changing the name or characteristics of a column, and so on).
• Services that permit the addition of new indexes and the deletion of
14
indexes that are no longer wanted.
• Import and export facilities from other software products. For
example, these services allow data to be transferred in a relatively easy
fashion between the DBMS and a spreadsheet, word processing, or
graphics program, or to load and unload data from or to flat files.
• Monitoring facilities, to monitor database usage and operation.
• Several of the services that form a part of the fourth-generation
environment are also furnished by some of the better DBMS. These
include such things as easy-to-use edit and query capabilities, screen
generators, report generators, and so on.
• Access to both procedural and nonprocedural languages.
• An easy-to-use graphical user interface that allows users to tap the
power of the DBMS without having to resort to a complicated set of
commands.
The actual level of functionality offered by a DBMS differs from
product to product. For example, a DBMS for a PC may not support
concurrent shared access, and it may only provide limited security,
integrity and recovery control. Modern, large multi-user DBMS products
offer all the above functions and much more.
DATABASE ADMINISTRATION AND SECURITY
Data administration and database administration
The Data Administrator (DA) and Database Administrator (DBA)
are responsible for managing and controlling the activities associated with
the corporate data and the corporate database, respectively. Depending on
13. the size and complexity of the organization and database system, the DA
and DBA can be the responsibility of one or more people.
Data administration - the management and control of the
corporate data, including database planning, development and
maintenance of standards, policies and procedures, and logical database
design. The DA is responsible for the corporate data, which includes non-computerized
data, and in practice is often concerned with managing the
shared data of users or business application areas of an organization. He
must ensure that the application of database technologies supports the
corporate objectives.
Database administration - the management and control of the
corporate database system, including physical database design and
implementation, setting security and integrity controls, monitoring system
performance, and reorganizing the database as necessary. The DBA is
more technically oriented than the DA, requiring knowledge of specific
DBMSs and the operating system environment. The primary
responsibilities of the DBA are centered on developing and maintaining
systems using the DBMS software to its full extent.
In some organizations, data administration is a distinct business
15
area, in others it may be combined with database administration.
Data administration Database administration
Involved in strategic IS planning Evaluates new DBMSs
Determines long-term goals Executes plans to achieve goals
Determines standards, policies and
procedures
Enforces standards, policies and
procedures
Determined data requirements Implements data requirements
Develops logical database design Develops physical database design
Develops and maintains corporate data
model
Implements physical database design
Coordinates database development Monitors and controls database use
Managerial orientation Technical orientation
DBMS independent DBMS dependent
Database security
Database security is the mechanism that protect the database against
intentional or accidental threats. Database security encompasses hardware,
software, people and data. This need for security is due to the increasing
amounts of crucial corporate data being stored on computer and the
14. acceptance that any loss or unavailability of this data could be potentially
disastrous.
A database represents nowadays an essential corporate resource that
should be properly secured using appropriate controls. Database security is
considered in relation to the following outcomes:
- theft and fraud,
- loss of confidentiality (secrecy),
- loss of privacy,
- loss of integrity,
- loss of availability.
An organization needs to identify the types of threats it may be subjected
to (we understand by threats any situations or events, whether intentional
or unintentional, that may adversely affect a system and consequently the
organization) and initiate appropriate plans and countermeasures,
considering also the costs of implementing them.
The types of countermeasures to threats on database systems range from
physical controls to administrative procedures. Despite the range of
computer-based controls that are available, generally, the security of a
DBMS is only as good as that of the operating system, owing to their close
association. The most widely used computer-based security controls for a
multi-user environment are:
1) Authorization (access control) - the granting of a right or privilege
that enables a subject to have legitimate access to a database
system or a database system's object. The process of authorization
involves authentication (a mechanism that determines whether a user
is who he or she claims to be) of a subject (a user) requesting access to
an object (a database table, view, procedure or any other object that
can be created within the database system). A system administrator is
usually responsible for permitting user's access, by creating individual
users accounts and passwords. Once a user is given permission to use a
DBMS, various other privileges may also be automatically associated
with it. Privileges are granted to users to accomplish the tasks required
for their jobs.
2) Views - virtual tables that does not necessarily exist in the database
but can be produced upon request by a particular user, at the time
of request. The view mechanism provides a powerful and flexible
security mechanism by hiding parts of the database from certain users.
16
15. 3) Backup and recovery - the process of periodically taking a copy of
the database and log file (and possibly programs) onto offline
storage media in order to assist the recovery of the database
following failure.
4) To keep track of database transactions, the DBMS maintains a special
file called a log file (or journal) that contains information about all
updates to the database. A DBMS should provide logging facilities,
sometimes referred to as journaling, which keep track of the current
state of transactions and database changes, to provide support for
recovery procedures.
5) Integrity -integrity constraints contribute to maintaining a secure
database system by preventing data from becoming invalid, and hence
giving misleading or incorrect results.
6) Encryption - the encoding of the data by a special algorithm that
renders the data unreadable by any program without the
decryption key.
7) Redundant Array of Independent Disks (RAID) - the hardware that
DBMS is running on must be fault-tolerant, meaning that the DBMS
should continue to operate even if one of the hardware components
fails. RAID technology works by having a large disk array comprising
an arrangement of several independent disks that are organized to
improve reliability and at the same time to increase performance.
17
DATABASE APPROACH - ADVANTAGES AND
DISADVANTAGES
The main benefits of the database approach are:
1. Control of data redundancy
The database approach eliminates redundancy where possible;
previously separate (and redundant) data files are integrated into a single,
logical structure. In addition, each data item occurrence is ideally recorded
in only one place in the database. That doesn’t mean that all redundancy
can or should be eliminated. Sometimes there are valid reasons for storing
16. multiple copies of the same data. However, the amount of redundancy
inherent in the database is controlled.
18
2. Data consistency
By controlling (or eliminating) data redundancy, we greatly reduce the
risk of inconsistencies occurring. If data is stored only once in the
database, any update to it's value has to be performed only once and the
new value is immediately available to all users. When controlled
redundancy is permitted in the database, the database system itself should
enforce consistency by updating each occurrence of a data item when a
change occurs – that means that the DBMS could guarantee that the
database is never inconsistent as seen by the user, by ensuring that any
change made to either of the two entities is automatically applied to the
other one also (process known as “propagating updates”). However, few
commercially available systems today are capable of automatically
propagating updates in this manner; most current products do not support
controlled redundancy at all, except in certain special situations.
3. Sharing of data
In a file-based approach, typically files are owned by the people or
departments that use them. On the other hand, the database belongs to the
entire organization and can be shared by all authorized users. Sharing
means not only that existing applications can share the data in the
database, but also that new applications can be developed to operate
against that same stored data. In other words, it might be possible to
satisfy the data requirements of new applications without having to create
any additional stored data. The new applications can also rely on the
functions provided by the DBMS, such as data definition and
manipulation, concurrency and recovery control, rather than having to
provide these functions themselves.
4. Improved data integrity
The problem of integrity is the problem of ensuring that the data in the
database is accurate. Database integrity is usually expressed in terms of
constraints, which are consistency rules that the database is not permitted
to violate. Inconsistency between two entries that purport to represent the
same “fact” is an example of lack of integrity; that particular problem can
arise only if redundancy exists in the stored data. Even if there is no
redundancy, however, the database might still contain incorrect
information.
17. Centralized control of the database can help in avoiding such problems
– insofar as they can be avoided – by permitting the data administrator to
define (and the DBA to implement) integrity rules to be checked whenever
any data update operation is attempted.
It is worth pointing out that data integrity is even more important in a
multi-user database than it is in a “private files” environment, precisely
because the database is shared. For without appropriate controls it would
be possible for one user to update the database incorrectly, thereby
generating bad data and so “infecting” other innocent users of that data.
19
5. Standards can be enforced
Establishing the data administration function is an important part of
the database approach. This organizational function has authority for
defining and enforcing data standards. With central control of the
database, data base administrator can ensure that all applicable standards
are observed in the representation of the data. Applicable standards might
include any or all of the following: corporate, installation, departmental,
industry, national and international standards. Standardizing data
representation is particularly desirable as an aid to data interchange, or
migration of data between systems. Likewise, data naming and
documentation standards are also very desirable as an aid to data sharing
and understandability.
6. Improved security
The data administration function has complete jurisdiction over the
database and is responsible for establishing controls for accessing,
updating and protecting data. The DBA can ensure that the only means of
access to the database is through the proper channels, and hence can
define security rules to be checked whenever access is attempted to
sensitive data. Different rules can be established for each type of access to
each piece of information in the database.
Without such rules the security of data might actually be more at risk than
in a traditional (dispersed) filing system; centralized nature of a database
system in a sense requires that a good security system be in place also.
7. Conflicting requirements can be balanced
Knowing the overall requirements of the organization – as opposed to
the requirements of individual users – the DBA can so structure the system
as to provide an overall service that is “best for the organization”. For
example, a representation can be chosen for the data in storage that gives
18. fast access for the most important applications (possibly at the cost of
poorer performance for certain other applications).
20
8. Increased productivity
A major advantage of the database approach is that the cost and time
for developing new business applications are greatly reduced. Once the
database has been designed and implemented, a programmer can code and
debug a new application at least two to four times faster than with
conventional data files; the reason for this improvement is that the
programmer is no longer saddled with the burden of designing, building
and maintaining master files.
9. The provision of data independence
Applications implemented on older systems tend to be data-depended.
What this means is that the way in which the data is organized in
secondary storage, and the technique for accessing it, are both dictated by
the requirements of the application under consideration, and moreover that
knowledge of that data organization and that access technique is built into
the application logic and code. It is impossible to change the storage
structure (how the data is physically stored) or access technique (how it is
accessed) without affecting the application, probably drastically.
In a database system, however, it would be extremely undesirable to
allow applications to be data-dependent, for at least the following two
reasons:
• Different applications will need different views of the same data
• The DBA must have the freedom to change the storage structure or
access technique in response to changing requirements, without having
to modify existing applications. For example, new kinds of data might
be added to the database; new standards might be adopted; application
priorities might change; new types of storage device might become
available; and so on. If applications are data-depended, such changes
will typically require corresponding changes to be made to programs,
thus typing out programmer effort that would otherwise be available
for the creation of new applications.
It follows that the provision of data independence is a major
objective of database systems. Data independence can be defined as the
immunity of applications to change in storage structure and access
technique. The database should be able to grow without affecting existing
applications; that is probably the single most important reason for
requiring data independence in the first place.
19. However, data independence is not an absolute – different systems
provide it in different degrees; in fact, few systems, if any, provide no data
independence at all – it is just that some systems are less data-dependent
than others.
There are, however, some disadvantages of the database approach,
such as:
1. Complexity. A DBMS is an extremely complex piece of software, and
all users (database designers and developers, database administrators
and end-users) must understand the DBMS functionality to take full
advantage of it.
2. Cost of DBMS. The cost of DBMS varies significantly, depending on
the environment and functionality provided. There is also the recurrent
annual maintenance cost, which is a percentage of the list price.
3. Cost of conversion. In some situations, the cost of the DBMS and
extra hardware may be insignificant compared with the cost of
converting existing applications to run on the new DBMS and
hardware. This cost is one of the main reasons why some companies
feel tied with their current systems and cannot switch to more modern
database technology.
4. Performance. Typically, a file-based system is written for a specific
application, such as invoicing. As a result, performance is generally
very good. A DBMS is written to be more general, to cater for many
applications rather that just one. The effect is that some applications
may not run as fast using a DBMS as they did before.
5. Higher impact of a failure. The centralization of resources increases
the vulnerability of the system. Since all users and applications rely on
the availability of the DBMS, the failure of any component can bring
operations to a complete halt until the failure is repaired.
21