#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Unit3rd
1. UNIT III
Basic concepts of databases
Data
Data can be defined as a representation of facts, concepts or instruction in a formalized
manner suitable for communication, interpretation or processing by human or electronic
machine. Data is represented with the help of characters like Alphabets (a-z, A-Z), digits
(0-9) or special characters (+, -, *, /, <, >, = etc.)
Database
A database is a set of structured and not redundant information, whose organization is
based on a data model. It consists of files, records (electronic cards) and fields.
Redundancy means repetition. Different data models are discussed later. Synonyms for
databases: registers, search tools, electronic card files, electronic collections.
File
A file (e.g. a table) consists of records. A file also contains the layout designed for these
records.
Record (Row)
A record is a collection of information kept about one person, product or transaction.
Data Item or Field (Column)
Within each record information is entered into a field which represents a category of
information.
OR
A set of character which are used together to represent a specific data element e.g. name
of a student in a class is represented by the data item, say, NAME.
Table
A table is a database object for storing data. Each table contains information about a
particular subject, such as customers. A table consists of records and fields. So, it´s a file.
2. Form
A form is a database object designed primarily for data input and display. The data shown
extracts from tables and queries.
Database program (Database Management System, DBMS)
A database management system is software which plays the role of interface between the
users and the database. It provides all the data definition, data manipulation, and data
control features the user needs to manage. A database management system and a database
form together a database system.
What is Database Management System(DBMS)
A database can be termed as a repository of data. A collection of actual data which
constitutes the information regarding an organization is stored in a database. For ex.
There are 1000 students in a college & we have to store their personal details, marks
details etc., these details will be recorded in a database.
A collection of programs that enables you to store, modify, and extract information from
a database is known as DBMS. The primary goal of a DBMS is to provide a way to store
& retrieve database information that is both convenient & efficient.
Database systems are designed to manage large bodies of information. Management of
data involves both defining structures for storage of information & providing way for
manipulation of data. In addition, the database system must ensure safety of data
.
DBMS is collection of programs that enables you to store, modify, and extract important
information from a database. There are many different types of DBMS, ranging from small
systems that run on personal computers to huge systems that run on mainframes.
3. Good data management is an essential prerequisite to corporate success.
Provided that data is:
• Complete
• Accurate
• Timely
• Easily available
Database System Applications
There are many different types of DBMSs, ranging from small systems that run on
personal computers to huge systems that run on mainframes. Databases are applied in
wide no. of applications. Following are some of the examples:-
• Banking: For customer information, accounts, loans & other banking transactions
• Airlines: For reservation & schedule information
• Universities: For student information, course registration, grades etc.
4. • Credit card transaction: For purchase of credit cards & generation of monthly
statements.
• Telecommunication: For keeping records of calls made, generating monthly
billetc.
• Finance: For storing information about holdings, sales & purchase of financial
statements
• Sales: For customer, product & purchase information
• Manufacturing: For management of supply chain.
• Human Resource: For recording information about employees, salaries, tax,
benefits etc.
We can say that when ever we need to have a computerized system, we need a database
system
Purpose Of Database system
A file system is one in which we keep the information in operating system files. Before
the evolution of DBMS, organizations used to store information in file systems. A typical
file processing system is supported by a conventional operating system. The system
stores permanent records in various files & it need application program to extract records,
or to add or delete records .
We will compare both systems with the help of an example.
There is a saving bank enterprise that keeps information about all customers & saving
accounts. Following manipulations has to be done with the system
• A program to debit or credit an account
• A program to add a new account.
• A program to find balance of an account.
• A program to generate monthly statements.
As the need arises new applications can be added at a particular point of time as checking
accounts can be added in a saving account. Using file system for storing data has got
following disadvantages:-
1. Data Redundancy & Inconsistency:-
5. Different programmers work on a single project, so various files are created by different
programmers at some interval of time. So various files are created in different formats &
different programs are written in different programming language.
Same information is repeated. For ex name & address may appear in saving account file
as well as in checking account. This redundancy results in higher storage space & access
cost. It also leads to data inconsistency which means that if we change some record in
one place the change will not be reflected in all the places. For ex. a changed customer
address may be reflected in saving record but not any where else.
2. Difficulty in Accessing data
Accessing data from a list is also a difficulty in file system. Suppose we want to see the
records of all customers who has a balance less than $10,000, we can either check the list
& find the names manually or write an application program .If we write an application
program & at some later time, we need to see the records of customer who have a balance
of less than $20,000, then again a new program has to be written. It means that file
processing system do not allow data to be accessed in a convenient manner.
3. Data Isolation
As the data is stored in various files, & various files may be stored in different format,
writing application program to retrieve the data is difficult.
4. Integrity Problems
Sometimes, we need that data stored should satisfy certain constraints as in a bank a
minimum deposit should be of $100. Developers enforce these constraints by writing
appropriate programs but if later on some new constraint has to be added then it is
difficult to change the programs to enforce them.
5. Atomicity Problems
Any mechanical or electrical device is subject to failure, and so is the computer system.
In this case we have to ensure that data should be restored to a consistent state. For
example an amount of $50 has to be transferred from Account A to Account B. Let the
amount has been debited from account A but have not been credited to Account B and in
the mean time, some failure occurred. So, it will lead to an inconsistent state.
So, we have to adopt a mechanism which ensures that either full transaction should be
executed or no transaction should be executed i.e. the fund transfer should be atomic.
6. Concurrent access Problems
Many systems allow multiple users to update the data simultaneously. It can also lead the
data in an inconsistent state. Suppose a bank account contains a balance of $ 500 & two
6. customers want to withdraw $100 & $50 simultaneously. Both the transaction reads the
old balance & withdraw from that old balance which will result in $450 & &400 which is
incorrect.
7. Security Problems
All the user of database should not be able to access all the data. For example a payroll
Personnel needs to access only that part of data which has information about various
employees & are not needed to access information about customer accounts.
POINTS TO PONDER
• A DBMS contains collection of inter-related data & collection of programs to
access the data.
• The primary goal of DBMS is to provide an environment that is both convenient
& efficient for people to use in retrieving & storing information.
• DBMS systems are ubiquitous today & most people interact either directly or
indirectly with database many times every day.
• Database systems are designed to store large bodies of information.
• A major purpose of a DBMS is to provide users with an abstract view of data i.e.
the system hides how the data is stored & maintained.
VIEW OF DATA
A database contains a no. of files & certain programs to access & modify these files. But
the actual data is not shown to the user; the system hides actual details of how data is
stored & maintained.
DATA ABSTRACTION
Data abstraction is the process of distilling data down to its essentials. The data when
needed should be retrieved efficiently. As all the details are not of use for all the users, so
we hide the actual (complex) details from users. Various level of abstraction to data is
provided which are listed below:-
Physical level:-
It is the lowest level of abstraction & specifies how the data is
actually stored. It describes the complex data structure in details.
7. Logical level:-
It is the next level of abstraction & describes what data are stored
in database & what relationship exists between various data. It is
less complex than physical level & specifies simple structures.
Though the complexity of physical level is required at logical
level, but users of logical level need not know these complexities.
8. View level:-
This level contains the actual data which is shown to the users.
This is the highest level of abstraction & the user of this level need
not know the actual details (complexity) of data storage.
9. Database Language:-
As a language is required to understand any thing, similarly to create or manipulate a
database we need to learn a language. Database language is divided into mainly 2 parts:-
1) DDL (Data definition language)
2) DML (Data Manipulation language)
Data Definition Language (DDL)
Used to specify a database scheme as a set of definitions expressed in a DDL
1. DDL statements are compiled, resulting in a set of tables stored in a special file called
a data dictionary or data directory.
2. The data directory contains metadata (data about data)
3. The storage structure and access methods used by the database system are specified by
a set of definitions in a special type of DDL called data storage and definition language
4. Basic idea: hide implementation details of the database schemes from the users
10. Data Manipulation Language (DML)
1. Data Manipulation is:
Retrieval of information from the database
Insertion of new information into the database
Deletion of information in the database
Modification of information in the database
2. A DML is a language which enables users to access and manipulate data. The goal is to
provide efficient human interaction with the system.
3. There are two types of DML:
Procedural: the user specifies what data is needed and how to get it
Nonprocedural: the user only specifies what data is needed
Easier for user.
May not generate code as efficient as that produced by procedural languages
4. A query language is a portion of a DML involving information retrieval only. The
terms DML and query language are often used synonymously.
POINTS TO PONDER
DBMS systems are ubiquitous today & most people interact either directly or
indirectly with database many times every day.
Database systems are designed to store large bodies of information.
A major purpose of a DBMS is to provide users with an abstract view of data i.e. the
system hides how the data is stored & maintained.
Structure of a database is defined through DDL & manipulated through DML.
DDL statements are compiled, resulting in a set of tables stored in a special file called
a data dictionary or data directory.
A query language is a portion of a DML involving information retrieval only. The
terms DML and query language are often used synonymously.
11. Advantages of DBMS
In DBMS, all files are integrated into one system thus reducing redundancies and making
data management more efficient. In addition, DBMS provides centralized control of the
operational data. Some of the advantages of data independence, integration and
centralized control are:
1. Redundancies and inconsistencies can be reduced
In conventional data systems, an organisation often builds a collection of
application programs often created by different programmers and requiring
different components of the operational data of the organisation. The data in
conventional data systems is often not centralised. Some applications may require
data to be combined from several systems. These several systems could well have
data that is redundant as well as inconsistent (that is, different copies of the same
data may have different values). Data inconsistencies are often encoutered in
everyday life. For example, we have all come across situations when a new
address is communicated to an organisation that we deal with (e.g. a bank, or
Telecom, or a gas company), we find that some of the communications from that
organisation are recived at the new address while others continue to be mailed to
the old address. Combining all the data in a database would involve reduction in
redundancy as well as inconsistency. It also is likely to reduce the costs for
collection, storage and updating of data.
2. Better service to the Users
A DBMS is often used to provide better service to the users. In conventional
systems, availability of information is often poor since it normally is difficult to
obtain information that the existing systems were not designed for. Once several
conventional systems are combined to form one centralised data base, the
availability of information and its up-to-dateness is likely to improve since the
data can now be shared and the DBMS makes it easy to respond to unforseen
information requests.
Centralizing the data in a database also often means that users can obtain new and
combined information that would have been impossible to obtain otherwise. Also,
use of a DBMS should allow users that do not know programming to interact with
the data more easily.
The ability to quickly obtain new and combined information is becoming
increasingly important in an environment where various levels of governments are
requiring organisations to provide more and more information about their
activities. An organisation running a conventional data processing system would
12. require new programs to be written (or the information compiled manually) to
meet every new demand.
3. Flexibility of the system is improved
Changes are often necessary to the contents of data stored in any system. These
changes are more easily made in a database than in a conventional system in that
these changes do not need to have any impact on application programs.
4. Cost of developing and maintaining systems is lower
As noted earlier, it is much easier to respond to unforseen requests when the data
is centralized in a database than when it is stored in conventional file systems.
Although the initial cost of setting up of a database can be large, one normally
expects the overall cost of setting up a database and developing and maintaining
application programs to be lower than for similar service using conventional
systems since the productivity of programmers can be substantially higher in
using non-procedural languages that have been developed with modern DBMS
than using procedural languages.
5. Standards can be enforced
Since all access to the database must be through the DBMS, standards are easier
to enforce. Standards may relate to the naming of the data, the format of the data,
the structure of the data etc.
6. Security can be improved
In conventional systems, applications are developed in an ad hoc manner. Often
different system of an organisation would access different components of the
operational data. In such an environment, enforcing security can be quite difficult.
Setting up of a database makes it easier to enforce security restrictions since the
data is now centralized. It is easier to control who has access to what parts of the
database. However, setting up a database can also make it easier for a determined
person to breach security. We will discuss this in the next section.
7. Integrity can be improved
Since the data of the organization using a database approach is centralized and
would be used by a number of users at a time, it is essential to enforce integrity
controls.
Integrity may be compromised in many ways. For example, someone may make a
mistake in data input and the salary of a full-time employee may be input as
$4,000 rather than $40,000. A student may be shown to have borrowed books but
13. has no enrolment. Salary of a staff member in one department may be coming out
of the budget of another department.
If a number of users are allowed to update the same data item at the same time,
there is a possiblity that the result of the updates is not quite what was intended.
For example, in an airline DBMS we could have a situation where the number of
bookings made is larger than the capacity of the aircraft that is to be used for the
flight. Controls therefore must be introduced to prevent such errors to occur
because of concurrent updating activities. However, since all data is stored only
once, it is often easier to maintain integrity than in conventional systems.
8. Enterprise requirements can be identified
All enterprises have sections and departments and each of these units often
consider the work of their unit as the most important and therefore consider their
needs as the most important. Once a database has been set up with centralised
control, it will be necessary to identify enterprise requirements and to balance the
needs of competing units. It may become necessary to ignore some requests for
information if they conflict with higher priority needs of the enterprise.
9. Data model must be developed
Perhaps the most imporrant advantage of setting up a database system is the
requirement that an overall data model for the enterprise be built. In conventional
systems, it is more likely that files will be designed as needs of particular
applications demand. The overall view is often not considered. Building an
overall view of the enterprise data, although often an expensive exercise, is
usually very cost-effective in the long term.
Limitations of Databases
In spite of the advantages of using DBS, there are certain limitations of using DBS:
• Overhead Cost:
When very large data need to be managed in large scale organization, then
database approach is motivated. This also requires a powerful hardware platform
and software for database management, which are quite expensive. Another cost
incurred will be hiring of system analyst, database designers, database
administrators, programmers and data processing personnels and cost of training.
This means to adopt this approach, a significant extra cost has to be borne by the
organization.
Security Problem:
14. Another disadvantage of this approach is that sharing of data also carries the risk
of the data being accessed by unauthorized user. Thus, the organization needs to
cope up this problem by taking security measures, concurrency control, recovery
and integrity.
Problem of Resources:
Running on-line, real time system to answer on-line queries requires large amount
of data to be stored. As a result more terminals may be needed to put managers
and other users online. Communication devices are also required to connect extra
terminals to the database. It may require resources such as multiprocessor system
and software to run a DBMS. Therefore, DBMS may require extra computing
resources depending upon the application.
Ownership Problem:
In file based system, programmer/user is the owner of the data and program
whereas database consisting of such files is owned by entire company. For any
change or read or insertion of data in the database, user needs to seek permission
from managers of the company. For a database to be successful, the database must
be viewed and updated as a corporate resource, not as individual resource.
Concurrency Problem:
Several problems can occur when concurrent transactions execute in an
uncontrolled manner. There may be lost update problem when two transactions
that access the same database items have their operations inter-leaved in a way
that makes the value of some database item incorrect.
Entity:
An entity is a person, place, thing, event or concept about which information is
recorded.
In a banking environment, examples of entities are CUSTOMERS, BANK
ACCOUNTS etc.
Attributes:
Attribute gives the characteristic of the entity. In other words, every entity has
some basic attributes that characterize it e.g.
1) A house can be described by its size, color, age and surroundings.
15. 2) A customer of a bank may be described by such attributes as name,
address and possibly a customer identification number.
3) A bank account can be represented by an account type, an account
number and an account balance.
So, in example 1) size, color, age and surroundings are attributes of the entity
house. In example 2) customer identification no, customer name and customer
address are attributes describing the entity “CUSTOMER” of a bank.
An attribute is often called a data element, a data field, a data item or an
elementary item.
Schema:
The overall logical data-base description is referred to as a schema. It is
sometimes also referred to as an overall model of the data, a conceptual model or
a conceptual schema.
The database schema is specified during the database design and is not expected
to change frequently. A displayed schema is called a Schema Diagram.
Student
Name Student Number Class Major
Course
Course Name Course Number Credit Hours Department
An example of Schema Diagram of a Database that stores Student Records.
A schema diagram displays only some aspects of a schema, such as the names of record
types and data items and some type of constraints. Other aspects are not specified in the
schema diagram. The actual data in a database may change quite frequently e.g. in the
above database the database changes every time we add a student.
The data in the database at a particular moment in time is called a Database State or
Snapshot. It is also called the current set of occurrences or instances in the database.