COURSE OUTCOMES
• Tounderstand the different issues involved in the design and implementation of a database system.
To study the physical and logical database designs, database modeling, relational, hierarchical, and
network models
• To understand and use data manipulation language to query, update, and manage a database
• To develop an understanding of essential DBMS concepts such as: database security, integrity,
concurrency, distributed database, and intelligent database, Client/Server (Database Server), Data
Warehousing.
• To design and build a simple database system and demonstrate competence with the fundamental
tasks involved with modeling, designing, and implementing a DBMS.
INTRODUCTION TO DATABASES
•A database is a collection of related data.1
• By data,we mean known facts that can be recorded and that have implicit meaning.
• For example, consider the names, telephone numbers, and addresses of the people
you know
• A database is designed, built, and populated with data for a specific purpose.
• It has an intended group of users and some preconceived applications in which
these users are interested.
5.
Example
Amazon has anestimated 1,000,000,000 gigabytes of data hosted across more than
1,400,000 servers.
Amazon's marketplace features over 600 million products, with approximately 12
million sold directly by Amazon
6.
Database Management Systems
•A database management system (DBMS) is a collection of programs that enables
users to create and maintain a database.
• DBMS is a general-purpose software system that facilitates the processes of defining,
constructing, manipulating, and sharing databases among various users and applications.
• The database definition or descriptive information is also stored by the DBMS in the
form of a database catalog or dictionary; it is called meta-data.
• Manipulating a database includes functions such as querying
• the database to retrieve specific data
Facts
Edgar F. Coddintroduced the relational database model in 1970. He suggested using SQL for
querying and arranging data into tables, or relations.
The hierarchical model and the network model, two early database models, both organized data trees
or graphs formats.
First adopted by the American National Standards Institute (ANSI), SQL rose to prominence as
the industry standard query language for relational databases in the 1980s.
11.
Traditional File Systemsvs DBMS
• In traditional file processing, the structure of data files is embedded in the
application programs, so any changes to the structure of a file may require
changing all programs
• DBMS access programs do not require such changes in most cases. The
structure of data files is stored in the DBMS catalog separately from the
access programs
12.
Traditional File Systemsvs DBMS
• In traditional file processing, data definition is typically part of the application
programs themselves. Hence, these programs are constrained to work with only
one specific database, whose structure is declared in the application programs.
• Whereas file-processing software can access only specific databases, DBMS
software
can access diverse databases by extracting the database definitions from the catalog
and using these definitions.
Why use DBMS?
•Avoid Redundancy
• Avoid unauthorized access
• Backup & Recovery
• Integrity Constraints
• Complex relationships between data
• Data Sharing
• Multiple Views of Data
Characteristics of DBMS
•Self-describing nature of a database system:
• A DBMS catalog stores the description of a particular database (e.g. data
structures, types, and constraints)
• The description is called meta-data
• Insulation between programs and data called program-data independence.
• Allows changing data structures and storage organization without having to
change the DBMS access programs.
19.
Characteristics of DBMS
•Data Abstraction- A data model is used to hide storage details and present
the users with a conceptual view of the database.
• Programs refer to the data model constructs rather than data storage details
• Support of multiple views of the data
• Each user may see a different view of the database, which describes only the
data of interest to that user.
20.
Database vs InformationRetrieval
• Traditionally, database technology applies to structured and formatted data
that arises in routine applications in government, business, and industry.
• Information Retrieval (IR) deals with books, manuscripts, and various
forms of library-based articles. Data is indexed, cataloged, and annotated
using keywords.
• IR is concerned with searching for material based on these keywords, and
with the many problems dealing with document processing and free-form
text processing.
21.
DBMS may notbe suitable for..
• Scenarios with High initial investment in hardware, software, and training
• Simple, well-defined database applications that are not expected to change at
• all
• Stringent, real-time requirements for some application programs that may
not be met because of DBMS overhead
22.
Common Terminology
• Client:typically designed so that it will run on a user workstation or personal computer.
• Server: typically handles data storage, access, search, and other functions
• An entity represents a real-world object or concept, such as an employee or a project
• from the miniworld that is described in the database.
• An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary.
• A relationship among two or more entities represents an association among the entities, for example, a works-on
relationship between an employee and a project.
• Entity-Relationship model—a popular high-level conceptual data model. Chapter 8 describes additional abstractions
used for advanced
• Data models: widely used relational data model, as well as the so-called legacy data models—the network and
hierarchical models—that have been widely used in the past.
23.
• Schema: Thedescription of a database is called the database.
• A schema diagram displays only some aspects of a schema, such as the names of
record types and data items, and some types of constraints.
• The data in the database at a particular moment in time is called a database state
or snapshot.
• It is also called the current set of occurrences or instances
• Data independence: the capacity to change the schema at one level of a database
system without having to change the schema at the next higher level
Three schema architecture
•The internal level has an internal schema , uses a physical data model and describes the
complete details of data storage and access paths for the database.
• The conceptual level has a conceptual schema, which describes the structure of the
whole database for a community of users.
• concentrates on describing entities, data types, relationships, user operations, and
constraints.
• The external or view level includes a number of external schemas or user views.
Each external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from that user group.
27.
• Logical dataindependence: capacity to change the conceptual schema without
having to change external schemas or application programs. Eg. expand the
database (by adding a record type or data item), to change constraints, or to
reduce the database (by removing a record type or data item).
• Physical data independence :capacity to change the internal schema without
having to change the conceptual schema. Eg. Changes to the internal schema may
be needed because some physical files were reorganized—for example, by creating
additional access structures—to improve the performance of retrieval or update.
28.
DBMS Languages
• Datadefinition language (DDL), is used by the DBA and by database designers to define both schemas.
The DBMS will have a DDL compiler whose function is to process DDL statements in order to identif
descriptions of the schema constructs and to store the schema description in the DBMS catalog
• Storage definition language (SDL), is used to specify the internal schema. The mappings between the two
schemas may be specified in either one of these languages.
• View definition language (VDL), is used to specify user views and their mappings to the conceptual schema
• DDL is used to define both conceptual and external schemas.
• Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides
a set of operations or a language called the data manipulation language(DML) for these purposes.
29.
Components of DBMS
•Access to the disk is controlled primarily by the operating system (OS), which
schedules disk read/write. Many DBMSs have their own buffer management
module to schedule disk read/write, because this has a considerable effect on
performance.
• The DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog
• Casual users and persons with occasional need for information from the database
interact using some form of interface, which we call the interactive query interface
30.
Components of DBMS
•The precompiler extracts DML commands from an application program written
in a host programming language.
• These commands are sent to the DML compiler for compilation into object code
for database access. The rest of the program is sent to the host language compiler.
• It is now common to have the client program that accesses the DBMS running
on a separate computer from the computer on which the database resides. The
former is called the client computer running a DBMS client software and the
latter is called the database server.
31.
Components of DBMS
•In addition to possessing the software modules just described, most DBMSs have database utilities
that help the DBA manage the database system. Eg. A loading utility is used to load existing data files
—such as text files or sequential files
• For the hierarchical DBMS ,the vendors or third-party companies are making a variety of conversion
tools available (e.g., Cincom’s SUPRA Server SQL) to transform data into the relational model.
• A backup utility creates a backup copy of the database, usually by dumping the entire database onto
tape or other mass storage medium.
• Data dictionary (or data repository) stores catalog information about schemas and constraints,
other information, such as design decisions, usage standards, application program descriptions, and
user information.
33.
Centralized DBMS vsClient/Server
Architecture
• Centralized DBMS : All DBMS functionality, application program
execution, and user interface processing are carried out on one machine.
• The client/server architecture was developed to deal with computing
environments in which a large number of PCs, workstations, file servers,
printers, database servers, Web servers, e-mail servers, and other software
and equipment are connected via a network.
34.
Client/Server architecture
The conceptof client/server architecture assumes an underlying framework that
consists of many PCs and workstations as well as a smaller number of mainframe
machines, connected via LANs and other types of computer networks. A client in
this framework is typically a user machine that provides user interface capabilities
and local processing. When a client requires access to additional functionality—
such as database access—that does not exist at that machine, it connects to a server
that provides the needed functionality. A server is a system containing both hardware
and software that can provide services to the client machines, such as file
access, printing, archiving, or database access.
35.
2-Tier Systems
• Insuch an architecture, the server is often called a query server or transaction server because it provides
these two functionalities. In an RDBMS, the server is also often called an SQL server.
• The user interface programs and application programs can run on the client side.
• When DBMS access is required, the program establishes a connection to the DBMS (which is on the server
side); once the connection is created, the client program can communicate with the DBMS.
• A standard called Open Database Connectivity (ODBC) provides an application programming interface
(API), which allows client-side programs to call the DBMS, as long as both client and server machines have
the necessary software installed. Most DBMS vendors provide ODBC drivers for their systems
• The architectures described here are called two-tier architectures because the software
• components are distributed over two systems: client and server
ALTER TABLE Customers
DROPCOLUMN Email;
ALTER TABLE Persons
ALTER COLUMN DateOfBirth year;
ALTER TABLE table_name
RENAME COLUMN old_name to new_name;
39.
ER Modelling
• Entity-Relationship(ER) model is a popular high-level conceptual data
model. This model and its variations are frequently used for the conceptual
design of database applications, and many database design tools employ its
concepts.
• The diagrammatic notation associated with the ER model is known as ER
diagrams
40.
Steps for DatabaseDesign
• Collect the functional requirements of the application. These consist of
the user-defined operations (or transactions) that will be applied to the
database, including both retrievals and updates.
• In software design, it is common to use data flow diagrams, sequence diagrams,
scenarios, and other techniques to specify functional requirements.
• Once the requirements have been collected and analyzed, the next step is to
create a conceptual schema for the database, using a high-level conceptual
data model. This step is called conceptual design.
41.
Components of ERmodels
• The basic object that the ER model represents is an entity, which is a thing in the real world with an independent
existence.
• An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) or it
may be an object with a conceptual existence (for instance, a company, a job, or a university course).
• Each entity has attributes—the particular properties that describe it.
• Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent
• meanings. For example, the Address attribute of the EMPLOYEE can be subdivided into Street_address, City, State,
and Zip,3 with the values
‘2311 Kirby’, ‘Houston’, ‘Texas’, and ‘77001.’
• Attributes that are not divisible are called simple or atomic attributes.
42.
• Most attributeshave a single value for a particular entity; such attributes are
called single-valued. For example, Age is a single-valued attribute of a
person.
• Similarly, one person may not have a college degree, another person may
have one, and a third person may have two or more degrees; therefore,
different people can have different numbers of values for the College_degrees
attribute. Such attributes are called multivalued.
43.
• The Ageattribute is called a derived attribute and is said to be derivable from the
Birth_date attribute, which is called a stored attribute. Some attribute values can be
derived from
• In some cases, a particular entity may not have an applicable value for an attribute. A
College_degrees attribute applies only to people with college degrees. For such situations, a
special value called NULL is created.
• Complex attributes. For example, if a person can have more than one residence and each
residence can have a single address and multiple phones, an attribute Address_phone for a
person can be specified .Both Phone and Address are themselves composite attributes.
44.
• Key attributesare those attributes that can uniquely identify the entity in the
entity set. Example: Roll-No is the key attribute because it can uniquely
identify the student.
• This attribute can take NULL value when entity does not have value for it.
• Example –The ‘Net Banking Active Bin’ attribute gives weather particular
customer having net banking facility activated or not activated.
Total Vs PartialParticipation
• Participation Constraints in database management refer to rules that determine the
minimum and maximum participation of entities or relationships in a given
relationship set.
• While partial participation permits discretionary involvement, total participation
requires every entity in one set to take part in a relationship in another set.
• Entire participation, sometimes known as required participation, denotes the
requirement that each individual in a group participate in an activity pertaining to
another group. It's similar to saying that in order to belong to one group, you must
somehow be associated with another.
51.
TYPES OF RELATIONSHIPS
•Cardinality defines the possible number of occurrences in one entity which is
associated with the number of occurrences in another.
• A one-to-many relationship refers to the relationship between two entities X and Y in
which an instance of X may be linked to many instances of Y, but an instance of Y is
linked to only one instance of X
• A one-to-one relationship is mostly used to split an entity in two to provide information
concisely and make it more understandable
• A many-to-many relationship refers to the relationship between two entities X and Y in
which X may be linked to many instances of Y and vice versa
52.
CREATING A DATABASEFROM
RELATIONSHIPS
• For a 1: N relationship, the key is mostly migrated to the N side of the
relationship.
• For a 1: 1 relationship, the key can be migrated to any side of the relationship
• For a M:N relationship, the keys of both participating entities are combined
into a table
53.
DEGREES OF RELATIONSHIP
•Unary (Degree 1)
• In this type of relationship, both the associating entity types are the same. So,
we can say that unary relationships exist when both entity types are the same
and we call them the degree of relationship is 1
• In a Binary relationship, there are two types of entity associates. So, we can
say that a Binary relationship exists when there are two types of entity and
we call them a degree of relationship is 2
54.
• In theTernary relationship, there are three types of entity associates. So, we
can say that a Ternary relationship exists when there are three types of entity
and we call them a degree of relationship is 3.
• In the N-ary relationship, there are n types of entity that associates. So, we
can say that an N-ary relationship exists when there are n types of entities.
KEYS
• A keyis an attribute or set of attributes
• Primary key uniquely identifies tuples in a table
• Eg Aadhar card, Roll no
• Other keys are known as alternative keys
• Primary key is chosen from a set of candidate keys
58.
• A superkey is a combination of all possible attributes that uniquely identify
tuples
• A table can have many super keys
• A super key ma have additional attributes not required for unique identity
59.
Candidate Key
• Acandidate key is an attribute or set of attributes derived from super key
• Also known as minimal super key
• Defined as a distinct set of attributes from which primary key is selected
• Cannot have NULL values
• Primary key is chosen from among candidate keys by database designer
60.
Other Terms
• Adomain D is a set of atomic values. By atomic we mean that each value in the
domain is indivisible as far as the formal relational model is concerned.
• A relation schema R, denoted by R(A1, A2, ...,An), is made up of a relation name R
and a list of attributes, A1, A2, ..., An. Each attribute Ai is the name of a role played
by some domain D in the relation schema R.
• A relation (or relation state) r of the relation schema R(A1, A2, ..., An), also
denoted by r(R), is a set of n-tuples r = {t1, t2, ..., tm}. Each n-tuple t is an ordered
list of n values t =<v1, v2, ..., vn>, where each value vi, 1 ≤ i ≤ n, is an element of
dom (Ai) or is a special NULL value.
61.
• The Cartesianproduct specifies all possible combinations of values from the underlying domains. Hence, if
we denote the total number of values
• Cardinality represents the number of tuples
• The lowercase letters q, r, s denote relation states.
• The letters t, u, v denote tuples.
• In general, the name of a relation schema such as STUDENT also indicates the current set of tuples in that
relation—the current relation state—whereas STUDENT(Name, Ssn, ...) refers only to the relation schema.
• An attribute A can be qualified with the relation name R to which it belongs by using the dot notation R.A
—for example, STUDENT.Name or STUDENT.Age. This is because the same name may be used for two
attributes in different relations. However, all attribute names in a particular relation must be distinct.
62.
Constraints
• Domain constraintsspecify that within each tuple, the value of each attribute
A must be an atomic value from the domain dom(A)
• The entity integrity constraint states that no primary key value can be
NULL. This is because the primary key value is used to identify individual
tuples in a relation.
• Having NULL values for the primary key implies that we cannot identify
some tuples.
63.
Referential Integrity Constraint
•The conditions for a foreign key, given below, specify a referential integrity
constraint between the two relation schemas R1 and R2.
• A set of attributes FK in relation schema R1 is a foreign key of R1 that
references relation R2 if it satisfies the following rules:
• 1. The attributes in FK have the same domain(s) as the primary key attributes
PK of R2; the attributes FK are said to reference or refer to the relation R2.
• 2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value
of PK for some tuple t2 in the current state r2(R2) or is NULL.
Relational Algebra
• Twomathematical Query Languages form the basis for “real” languages (e.g.
SQL), and for implementation:
• Relational Algebra: More operational, very useful for representing execution
plans.
• Relational Calculus: Lets users describe what they want, rather than how to
compute it. (Nonoperational, declarative.)
66.
Relational Algebra
• Aquery is applied to relation instances, and the result of a query is also a
relation instance.
• Schemas of input relations for a query are fixed (but query will run
regardless of instance!)
• The schema for the result of a given query is also fixed
67.
Operations
• Selection -Selects a subset of rows from relation.
• Projection Deletes unwanted columns from relation.
• Cross-product Allows us to combine two relations.
• Set-difference Tuples in reln. 1, but not in reln. 2.
• Union Tuples in reln. 1 and in reln. 2.
69.
Projection
• Deletes attributesthat are not in projection list.
• Schema of result contains exactly the fields
in the projection list, with the same names
that they had in the (only) input relation.
• Projection operator has to eliminate duplicates!
70.
Selection
• Selects rowsthat satisfy selection condition. V
• No duplicates in result
• Schema of result identical to schema of
(only) input relation.
• Result relation can be the input for
another relational algebra operation!
Cross Product
• Eachrow of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1, with field names
`inherited’ if possible.
• Conflict: Both S1 and R1 have a field called sid
73.
Join
• Result schemasame as that of cross-product.
• Fewer tuples than cross-product, might be able to compute more efficiently
• Sometimes called a theta-join.
74.
Rename
• We canalso define a formal RENAME operation—which can rename either the
relation name or the attribute names, or both—as a unary operator.
• The general RENAME operation when applied to a relation R of degree n is
denoted by any of the following three forms:
ρS(B1, B2, ..., Bn)(R) or ρS(R) or ρ(B1, B2, ..., Bn)(R)
where the symbol ρ (rho) is used to denote the RENAME operator, S is the new
relation name, and B1, B2, ..., Bn are the new attribute names.
Types of Join-Theta Join & Equi Join
• A general case of join operation
• Also called conditional join
• Denoted by symbol theta
• Conditional operators such as =,<,> etc can be used in theta join
• Only = is used in equi join
77.
Natural Join
• Specialcase of Equi Join
• Denoted by
• Both tables should have same attribute
• Name and domain should be the same in both tables
• Does not use comparison operator
• Removes redundant attributes
Practice on RelationalAlgebra
• Consider the following schema: Suppliers (sid : integer, sname : string,
address : string) Parts (pid : integer, pname : string, color : string) Catalog (sid
: integer, pid : integer, cost : real) The key fields are underlined and domain
of each field is listed after the field name .
82.
Questions
• Find thename of suppliers who supply some red parts
• Find the sids of suppliers who supply some red or green parts
• Find the sids of suppliers who supply some red part or are at 221 packer Ave
• Find the sids of suppliers who supply some red part and some green part
• Find the sids of suppliers who supply every part
• Find the sids of suppliers who supply every red part
• Find the sids of suppliers who supply every red or green part
• For everyproject located in ‘Stafford’, list the project number, the controlling
department number, and the department manager’s last name, address, and
birth date.
88.
• Find thenames of employees who work on all the projects controlled by
department number 5
• Make alist of project numbers for projects that involve an employee whose
last name is ‘Smith’, either as a worker or as a manager of the department
that controls the project.