UNIT-I
Chapter-I : DATABASE SYSTEMS
Data: Data consists of raw facts, which the computer stores and reads. Data can consist of
letters, numbers, sounds or images etc. that have some meaning in the user environment. Data
are the raw material from which information is generated.
Information: When data has been processed to give it more meaning, it is called as
information.
Database: An organized collection of logically related data usually designed to meet the
information needs of multiple users in an organization.
Database Management System: (DBMS) is a software tool used to define, create, maintain
and provide controlled access to the database.
DBMS software stores data structures, relationship between those structures and the access
paths to those structures in a central location.
Q) How the data is organized within a database?
Ans: To help you visualize how a database stores data, think about a typical address book.
Fields: Each field contains a specific type of information such as first name, last name, phone
number, email etc…
Records: Record is a collection of related fields.
Ex: All information about one person in an address book.
Tables: A complete collection of records makes a table
Ex: Contacts table
FirstName LastName Company Address City State Pincode
Record1
Record2
Q) Why the database is important?
Ans: If you keep list of all your business customers in a database, you can
• You can sort the customers by pincode.
• Create a simple onscreen entry form that even your technically unskilled employee
can use successfully
You can manipulate data in almost anyway you want.
Files, File Systems & Problems With File System Data Management.
Manual filing system works well when the number of items stored is quite small and they are
only needed to be stored and retrieved.
A manual filing system crashes when cross referencing and processing of information in the
files is carried out.
Limitations or disadvantages of File Processing Systems.
Program data dependence: File descriptions are stored within each program that access a
given file.
In the invoicing system program access both the inventory pricing file and the customer
master file.
Page 1
Therefore this program contains a detailed description for both these files.In the below figure
both the customer master file is contained in both the order filing system and invoicing
system. Suppose it is decided to change the customer address field length in the records in
customer master file from 20 to 30 characters. For this, each related program have to be
modified.
Duplication of data or Redundancy of data: in the below figure, order filing system
contains the inventory master file, the invoicing system contains inventory pricing file.
Inventory master, inventory pricing file contains product descriptions and quantity. There is
duplication of data which requires additional storage space.
Orders Department Accounting Department
Inconsistent data: The redundancy in storing the same data multiple times leads to data
inconsistency when an update is applied to some of files but not to other.
Limited data sharing: In the file processing system, users have little opportunity to shared
data outside their applications.
Lengthy development times: Developing an application by using the file systems is very
skilled activity. The programmers has to write many programs for supporting file opening,
file closing and iterative logic for representing operations, this is very lengthy process.
Incompatible file formats: Since the structure of files is embedded in application, the
structure is dependent on application programming languages.
Ex: structure of file generated by COBOL is different from ‘C’ programming language.
The application programmer has to develop software to convert the files to some common
format for processing. This may be time consuming and expensive.
Fixed Queries: Any query or report needed by organization has to be developed by the
application programmer.
Lack of security: All users could see all data and no security and authorization subsystem.
No recovery and back up system:Data could be lost in case of hardware or software failure.
All the data is stored in disk files and accessed according to access methods (sequential,
direct etc..) provided by file system and chosen by application programmer.
Page 2
Progra
m A
Progra
m
B
Progra
m
C
Order File System
Customer
Master File
Inventory
Master File
Back Order
File
Progra
m A
Progra
m
B
Invoicing System
Inventory
Pricing File
Customer
Master File
Order
Programs
Accounting
Programs
Payroll
Programs
DBMS
Database
Customer master data
Inventory master data
Employee master data
Back order data
Q: What is a database system and What are the advantages of database systems?
Database System: Database and DBMS software together is called a database system.
Program data independence: DBMS allows certain types of changes to the structure of the
database without affecting the stored data and the existing application.
Improved data sharing: The DBMS helps create an environment in which end users have
better access to more data and better managed data.
Improved data security: The DBA uses security and authorization subsystem provided by
DBMS to create accounts and to specify account restrictions. The DBMS will enforce these
restrictions automatically.
Better Data Integration: DBMS promotes and enforces integrity rules, thus minimizing
data redundancy and maximizing data consistency.
Minimized data inconsistency: Data inconsistency is also reduced in a properly designed
database as such a database doesn’t allow different versions of same data in different places.
Ex: company’s sales department stores salesman name as ‘Bill Brown’ and the same person
name is stored as ‘William G Brown’ in company’s HR department.
Improved Data access: A query is a question or specified request issued to DBMS for data
manipulation. Example of query language is SQL
An Adhoc query is a spur of the moment question. The DBMS sends back an answer ( called
the query result set) to the application
For Ex: How many of our cutomer have balances of Rs. 3000 or more?
The DBMS gives quick answers to adhoc queries.
Improved Decision Making: Better managed data and improved data access makes it
possible to generate better quality information on which better decisions are based.
Increased end- user productivity: The availability of data and tools that transform data into
information allows end user to make quick decisions that can make the difference between
success and failure in global economy.
Page 3
Database system environment:
Database system environment is made up of five major parts. They are hardware, software,
people, procedures and data.
Hardware: Hardware refers to all of the system’s physical devices.
For ex: computers (micro computers, servers etc..), storage devices, printers, networking
devices (hubs, switches etc…) and other devices ( ATMS, ID readers etc..)
Software: Three types softwares are needed to make the database system function fully.
1. Operating system software: manages all hardware components and makes it
possible for all other software to run on the computers. Ex: UNIX, Microsoft
windows.
2. DBMS Software: manages the data within the database system.
Ex: SQL server, Oracle, DB2, My SQL
3. Application programs and utility software: are most commonly used to access data
found within the database to generate reports, tabulations and other information for
decision making. For Ex: All DBMS vendors provide GUI’s to create database
structures, control database access and monitor database operations.
4. People includes all users of database system. On the basis of their job functions, five
types of users can be identified.
1. System administrators: looks after database system general operations.
2. Database administrators (DBA) manages the DBMS and ensures that the
database is functioning properly.
3. Database Designers or Database architects design database structure. The
determination of what data are to be entered into the database and how the data are to
be organized is an important part of database designer’s job.
4. System analysts and programmers design and implement the application
programs. They design and create the data entry screens, reports and procedures
through which end users access and manipulate the database’s data.
5. End users are the people who use the application programs to run the
organisation’s daily operations. For Ex: clerks, managers, supervisors and directors.
High level end users uses the information obtained from the database to make
decisions.
5. Procedures: Procedures play an important role in a company. They enforce the
standards by which the business is conducted within the organization and with
customers. Procedures are also used to ensure that there is an organized way to
monitor and audit both the data that enter the database and the information generated
through the use of that data.
6. Data: are the raw materials from which information is generated. Data covers the
collection of facts stored in the database.
DBMS Functions:
DBMS performs several functions that guarantee the integrity and consistency of the data in
database. They are
1. Data dictionary management: The DBMS stores definitions of the data elements
and their relationships in a data dictionary. The DBMS provide data abstractions and
it removes structural and data dependency from the system.
2. Data Storage management: The DBMS creates and manages the complex structures
required for data storage, thus you need not define and program the physical data
characteristics. It also provide storage for on-screen definitions, report definitions,
data validation etc..
3. Data transformation and presentation: The DBMS must manage the data in proper
format for each country while entering dates, names etc... must not allow different
Page 4
versions of same data in different places. Ex: company’s sales department stores
salesman name as ‘Bill Brown’ and the same person name is stored as ‘William G
Brown’ in company’s HR department
4. Security Management: DBMS creates a security system that enforces user security
and data privacy. Security rules determine which users can access the database, which
data items each user can access? , which data operations (read, add, delete or modify)
the user can perform. This is important in multi user database system.
5. Multi user access control: The DBMS uses sophisticated algorithms to ensure that
multiple users can access the database at the same time.
6. Back up and recovery management: DBMS provide special utilities that allow
DBA to perform back up and restore procedures. Recovery management deals with
the recovery of database after a failure, such as bad sector in disk or power failure.
7. Data integrity management: DBMS promotes and enforces integrity rules, thus
minimizing data redundancy and maximizing data consistency.
8. Data access languages and application programming interfaces: The DBMS
provide access through a query language. A query language is a non procedural
language that lets the user specify what must be done without having to specify how it
is to be done. Example of query language is SQL
9. Database communication interface: DBMS accepts end user requests from multiple,
different network environments.
Disadvantages of DBMS
1. Increased Costs: Database system requires hardware, software and highly skilled
people. The cost of maintaining these.
2. Management Complexities: The database system hold important data that are
accessed from multiple sources, security issues may occur.
3. Frequent Updates: must perform frequent updates an d apply latest patches and
security measures to all components. These increases personnel training costs.
4. Vendor Dependence: due to heavy investment in technology and personnel training,
companies do not change database vendors. As a result, vendors donot offer pricing
point advantages to existing customers.
5. Frequent Updates / Replacement Cycles: DBMS vendors frequently upgrade their
products by adding new functionality i.e, upgrade versions of software. Some of these
versions require hardware upgrades and training to users costs money.
Page 5
UNIT-I Chapter-II
Data modeling and data models
Q: What is a data model?
Ans: Data model is blue print containing all the instructions to build a database that will meet
all the end –user requirements.
This blue print contains both text descriptions in plain, unambiguous language and clear
useful diagrams depicting the main data elements.
Q: Explain The importance of data models?
Ans:Data models are communication tool that enables interaction among the designer, the
application programmer and end user.
Data models are used to represent real world data and how the different degrees of data
abstraction enables data modeling.
Ex: a house blue print is an abstraction; you cannot live in a blue print, Similarly the data
model is an abstraction, you cannot draw the required data out of the data model. As you
cannot build a perfect house without blue print, you cannot create a good database without
creating an appropriate data model.
Q:What are Business Rules?
Ans: Business rule is a description of policy, procedure within a specific organization.
Properly written business rules are used to define entities, attributes, relationships and
constraints.
Example1: Consider 2 business rules
• A customer may generate many invoices.
• An invoice is generated by only one customer.
These business rules establish 2 entities (CUSTOMER and INVOICE) and a 1:M
relationship.
Example 2: A business rule is as follows
• A training session cannot be scheduled for <10 employees or for >30 employees
This rule establishes a constraint (not <10 employees or for >30 employees) , two entities
(EMPLOYEE and TRAINING) and a relationship between these entities.
Q: How to Discover Business Rules
Ans: The main sources of business rules are
• company managers,
• policy makers,
• department managers and
• written documentation such as a company’s procedures, standards or operation
manuals,
• direct interviews with end users.
The process of identifying and documenting business rules is essential to database design for
several reasons.
• They help standardize the company’s view of data
• They allow designer to develop relationship participation rules and constraints and to
create an accurate data model.
Q: Why not all business rules can be modeled?
Ans: For ex: No pilot can fly more than 10 hours within 24- hour period .
Such a business rule can be enforced by application software and not by database design.
Page 6
Q: Explain about hierarchical model?
Ans: Its structure is represented by an upside – down tree.
The hierarchical structure contains levels or segments.
Within the hierarchy, the top layer (also called root) is the parent of the segment directly
beneath it.
Advantages:
1. It promotes data sharing.
2. Parent child relationship promotes conceptual simplicity and data integrity.
3. Database security is provided and enforced by DBMS.
4. It is efficient with 1:M relationships.
Disadvantages:
• Complex to implement and difficult to manage as it requires knowledge of physical
data storage characteristics.
• Can implement only 1:M relationships. So it has implementation limitations.
• No standards.
• No DDL and DML language in the DBMS.
• Lacks structural independence. Changes in structure require changes in all
application programs.
• No adhoc queries
• Access paths predefined
This technology is best applied when conceptual model also resembles a tree and most data
accesses begin with the same root file.
Q: Explain about network model?
Ans: Network model allows a record to have more than one parent.
Advantages:
• It can handle M:N and multi parent relationship types.
• Data access is more flexible
• There are standards defined to implement this model.
• It includes DDL and DML commands in DBMS
Disadvantages:
• Little data independence.
• Lacks structural independence. Changes in structure require changes in all
application programs.
• No adhoc queries
• Access paths predefined
Q: What is CODASYL and DBTG?
Ans: To help establish database standards, the conference on data systems languages
(CODASYL) created Database Task Group (DBTG) in late 1960s.
The final DBTG report contained specifications for 3 crucial database components.
The schema is the conceptual organization of the entire database as viewed by DBA
The subschema defines the portion of the database as seen by the application programs.
The application programs invoke the subschema required to access the appropriate database
file.
A data management language that defines the environment in which data can be managed.
Page 7
Q: Explain about The Relational Model ?
Ans: Here tables are called as “Relations”
Rows are called “Tuples” and column names as “attributes”.
Every attribute has a domain. A domain is set of permissible values that can be given to an
attribute.
A common attribute existing in any two tables creates a relationship between the tables.
It supports relationship types (1:1, 1: M or M: N)
The RDBMS manages all the physical details, while the user sees the relational database as
collection of tables. (it enables you to view data logically rather than physically.)
The RDBMS uses SQL to translate user queries into instructions for retrieving the required
data. The SQL engine executes all queries.
Advantages
• Promotes data and structural independence.
• Tabular view improves conceptual simplicity.
• Adhoc query capability is based on SQL
• RDBMS isolates end user from physical level details.
Disadvantages:
• RDBMS requires substantial hardware and software overhead.
• Conceptual simplicity gives untrained people the tools to use good system poorly.
• It may produce islands of information problems as individuals and departments can
easily develop their own applications.
Q: Explain about The Entity Relationship Model?
Ans: ER models are normally represented in an entity relationship diagram (ERD)
The ER model is based on the following components:
Entity: Entity is anything about which data are to be collected and stored
Attribute: Attributes are characteristics of entities.
Relationship:A relationship is an association between entities.
Advantages:
• Visual modeling yields conceptual simplicity.
• Visual representation makes it an effective communication tool.
• It can be integrated with dominant relational model.
Disadvantages
• There is limited constraint representation.
• There is limited relationship representation.
• There is no data manipulation language.
Q: Explain the various notations used with ERDs ?
Ans: The various notations used with ERDs are
• The chen notation favors conceptual simplicity.
• The crow’s foot notation favors implementation – oriented approach.
• The UML notation can be used for both conceptual and implementation modeling.
Q: Explain about Object Oriented model?
Ans: In this model both the data and their relationships are contained in a single structure
known as an Object.
Page 8
Object includes information about relationships between facts within the object and
relationships with other objects.
The OODM is the basis of OODBMS
The OODM is said to be semantic data model because semantic indicated meaning.
The object oriented data model is based on the following components
• An object is an abstraction of a real-world entity.
• Attributes describe the properties of an object.
• Objects that share similar characteristics are grouped in classes.
• A class is a collection of similar objects with shared structure (attributes) and
behaviour (methods) (where as entities do not have methods)
• Classes are organized in class hierarchy (which represents an upside – down tree in
which each class has only one parent)
• Inheritance is the ability of an object within class hierarchy to inherit the attributes
and methods of the classes above it.
Object oriented data models are depicted using UML diagrams.
Advantages:
• Semantic content is added
• Visual representation includes semantic content.
• Inheritance promotes data integrity.
Disadvantages:
• No widely accepted standard.
• It is a complex navigational system.
• There is a steep learning curve.
• High system overhead slows transaction.
Q) Distinguish between Logical and Physical data independence.
Logical Data Independence:
Logical data independence is the ability to modify the conceptual schema without having
alteration in external schemas or application programs. Alterations in the conceptual schema
may include addition or deletion of fresh entities, attributes or relationships and should be
possible without having alteration to existing external schemas or having to rewrite
application programs.
Physical Data Independence:
Physical data independence is the ability to modify the internal schema without having
alteration to the conceptual schemas or application programs. Alteration in the internal
schema might include.
* Using new storage devices.
* Using different data structures.
* Switching from one access method to another.
* Using different file organizations or storage structures.
* Modifying indexes.
Page 9
Explain about the Conceptual, Internal and external and Physical Model
(Or)
Explain about different levels of data abstraction
(Or)
Explain about three schema architecture.
Ans:
The Conceptual Model
1. The conceptual model represents a global view of the organization’s data as viewed by all
end-users.
2. It describes all entities and their attributes, the relationships among these entities and the
constraints on these relationships.
3. The conceptual model forms the basis for the conceptual schema - a description of the
database structure.
4. The conceptual model is independent of both software (DBMS and OS) and hardware.
5. The E-R model is the most widely used to represent conceptual model
The Internal Model
Page 10
1. The internal model adapts the conceptual model to a specific DBMS (e.g., hierarchical,
network, and relational).
2. The internal model is software-dependent but hardware-independent.
3. Development of the internal model is especially important to hierarchical and network
database models.
The External Model
1. The external model is the end user’s/ applications programmer’s view (local view) of the
database .
2. It is concerned about a specific business operation.
3. It is implemented through the CREATE VIEW command in SQL.
Benefits of the external model
• Application program development is simplified because the programmer does not have to
be concerned about data not relevant to his/her application.
• Communication with the end-user is simplified.
• Identification of data required to support each business operation is simplified.
• Access control and security can be easily implemented.
Page 11
The Physical Model
• The physical model operates at the lowest level of abstraction, describing the way data
is stored on storage media such as disks or tapes.
• It requires the definition of
physical storage devices and
the access methods required to reach the data.
• The physical model is both software and hardware-dependent.
Page 12
UNIT-I
Chapter-III The Relational Database model
Explain characteristics of relational table?
1. A table is perceived as a two-dimensional structure composed of rows and columns.
2. Each table row (tuple) represents a single entity occurrence within the entity set.
3. Each table column represents an attribute, and each column has a distinct name.
4. Each row/column intersection represent a single data value.
5. All values in a column must conform to the same data format.
6. Each column has a specific range of values known as the domain of that attribute.
Example: The domain for the gender attribute consists of only two possibilities: M or F.
The domain for a company’s date of hire attribute consists of all dates (from start up date to
current date)
Attribute may share a domain.
For ex: a student address and a professor address share the same domain of all possible
addresses.
7. The order of rows and columns is immaterial to the DBMS
8. Each table must have an attribute or a combination of attributes that uniquely identifies
each row. Ex: Roll_No in the STUDENT table
What are data types support by most DBMS?
Ans: The different data types are
1. Numeric: Numeric data are data on which you can perform arithmetic operations.
2. Character: Character data or text data or string data can contain any character, symbol or
digit not intended for mathematical manipulations.
3. Date: Date attributes contain calendar dates stored in special format known as the julian
date format.
Logical: Logical data can have only a true or false (yes or no) condition.
What is data dictionary?
Ans: The data dictionary provides detailed descriptions of all tables and so contains all of
attributes names,characteristics and structure of each table in the system.
What is system catalog?
Ans: it is a detailed system data dictionary that describes all objects within the database,
including data about table names, table’s creator etc..
The system catalog is a system – created database whose tables store the user created
database characteristics and content. These tables can be queried just like user-defined table.
Explain about indexes in relational database?
Ans: An index is composed of an index key and a set of pointers. An index can be used to
retrieve data more efficiently. When you define a table’s primary key, the DBMS
automatically creates a unique index on the primary key columns.
Page 13
What is meant by functional dependence?
Ans: The attribute B is functional dependent on A
if each value in column A determines one and only one value in column B.
Ex:
What is composite key?
Ans: A key may be composed of more than one attribute. Such a multi- attribute key is known
as a composite key.
What is meant by fully functional dependency?
Ans: If attribute B is functionally dependent on a composite key A but not on any subset of
that composite key, the attribute B is fully functionally dependent on A.
Explain about various keys used in relational database model?
Key Type Definition Example
Super key An attribute (or combination of attributes)
that uniquely identifies each row in a
table.
In STUDENT table, the super key could
be any of the following:
STU_NUM
STU_NUM, STU_LNAME
Candidate
key
A minimal (irreducible) super key is a
candidate key.
A super key that does not contain a subset
of attributes that is itself a super key.
STU_NUM,STU_LNAME is a super
key, but it is not a candidate key
because STU_NUM by itself can
uniquely identifies each row in the
STUDENT table.
Primary
key
A candidate key is selected as a primary
key. It cannot contain NULL values
If employee’s PAN number has been
included as one of the attribute in the
EMPLOYEE table. EMP_NUM and
EMP_PAN are both candidate keys
because both uniquely identifies each
employee. Selection of EMP_NUM as
primary key would be designer’s
choice.
Secondary
key
An attribute or combination of attributes
used strictly for data retrieval purposes
Most of the time if I need city wise
customers list from CUSTOMER table,
I can place a secondary key on
CUS_CITY column to get a speed reply.
Foreign
key
An attribute in one table whose values must either match the primary key in another
table or be null.
Q: What is a constraint? Write short notes on integrity constraints/ rules with example?
Ans: A constraint is a restriction placed upon the data values that can be stored in a column
or columns of a table.
Integrity Constraint are of 2 types
1. Entity integrity constraint
2. Referential integrity constraint
Entity integrity : All primary key entries are unique and no part of a primary key may be null.
Referential integrity: A foreign key may have either a null entry, as long as it is not part of its
tables primary key or an entry that matches the primary key value in a table to which it is
related. (Every non- null foreign key value must reference an existing primary key value.)
Page 14
Example: Table name: AGENT
Primary key: AGENT_CODE Foreign Key: none
AGENT_CODE AGENT_FNAME AGENT_PHONE
A01 ANU 2475258
A02 RAM 2465258
Table Name: CUSTOMER
Primary Key: CUS_CODE and Foreign Key: AGENT_CODE
CUS_CODE CUS_FNAME AGENT_CODE
C01 SWATHI NULL
C02 DOLLY A01
C03 RAMA A01
Here the customer swathi is not assigned a agent yet, hence the agent code is NULL.
No entry in agent code column in customer table has invalid entry as they reference a valid
entry A01 which is anu’s agent code.
Also primary keys of both tables contain null values and has unique values.
Relational set operators or relational algebra
Relational algebra is set of basic operations used to manipulate the data in relational model.
These operations can be classified into two categories:
1. Basic set operations: These are
When two or more tables share
• the same number of columns and
• the columns have the same names and
• the columns share the same (or compatible) domains
the Two tables are said to be union-compatible.
UNION: union combines all rows from two tables, excluding duplicate rows. The two tables
must be union- compatible.
Example: R3=R1U R2
R1 R2
Page 15
Binary operations
UNION
INTERSECTION
SET DIFFERENCE
CARTESIAN PRODUCT
Relational operations
SELECT
PROJECT
JOIN
DIVISION
Fname
A1
A2
A3
A4
A7
yields
UNION
Intersect: Intersect yields only the rows that appear in both tables.
The tables must be union –compatible to yield valid results.
Page 16
Fname
A1
A2
A3
A4
Fname
A1
A7
A2
A4
Example:
R1 R2 R3=R1 n R2
yields
INTERSECT
Difference: Difference yields all rows in one table that are not found in the other table. The
tables must be union-compatible. Example:
R1 R2 R1-R2 =R3
yields
DIFFERENCE
Cartesian Product: yields all possible pairs of rows from two tables.
R3=R1X R2
R1 R2
yields
Page 17
Fname
A1
A2
A4
Fname
A1
A2
A3
A4
Fname
A1
A7
A2
A4
Fname
A3
Fname
A1
A2
A3
A4
Fname
A1
A7
A2
A4
Course Fname
C1 A1
C1 A2
C1 A3
C2 A1
C2 A2
C2 A3
Course
C1
C2
Fname
A1
A2
A3
Select: Also known as RESTRICT
Yields values for all rows found in a table that satisfy a given condition.
PRODUCT
SELECT only price < $10 yields
Project: yields all values for selected attributes. Project yield a vertical subset of a table.
PRODUCT
PROJECT Price yields
Join: A join is used to combine rows from multiple tables.
Natural Join links tables by selecting only the rows with common values in common
columns. A natural join is a result of a three-stage process.
1.a PRODUCT of the tables is created.
2. a SELECT is performed on the output to yield only the rows for which Acode =
Agent_code
and these common columns Acode, Cus_code are called as join columns.
3. a PROJECT is performed on the result to include only one join column.
Table name: CUSTOMER Table name: AGENT
STEP1: product of the above 2 tables yields
Cus_code Name Agent_code Acode Name
C01 ANU A01 A01 RAJ
C01 ANU A01 A02 TAJ
Page 18
Pcode Pdesc Price
1 Flash Light 5
2 Lamp 25
3 Battery 7
4 100W Bulb 15
Pcode Pdesc Price
1 Flash Light 5
3 Battery 7
Price
5
25
7
15
Pcode Pdesc Price
1 Flash Light 5
2 Lamp 25
3 Battery 7
4 100W Bulb 15
Cus_code Name Agent_code
C01 ANU A01
C02 RANI A02
Acode Name
A01 RAJ
A02 TAJ
C02 RANI A02 A01 RAJ
C02 RANI A02 A02 TAJ
STEP 2: SELECT rows for which Acode = Agent_code
Cus_code Name Agent_code Acode Name
C01 ANU A01 A01 RAJ
C02 RANI A02 A02 TAJ
STEP 3: PROJECT to remove Acode field from the result.
Cus_code Name Agent_code Name
C01 ANU A01 RAJ
C02 RANI A02 TAJ
The column on which the join occurs only once in new table.
Equi Join:
1. Links tables on the basis of equality condition.
2. Does not eliminate duplicate columns
Theta join : if any other comparison operator other than equality is used, the join is called
theta join.
Left outer join: yields all of the rows in CUSTOMER table, including those that do not have
a matching value in AGENT table.
Right outer join: yields all of the rows in AGENT table, including those that do not have a
matching value in CUSTOMER table.
DIVIDE: This operation uses single column table as the divisor and 2-column table as the
dividend. The tables must have a common column.
DIVIDE YIELDS
Page 19
CODE LOC
A 5
A 9
B 5
B 3
C 6
CODE
A
B
LOC
5
STUDENT
STUDENT
RollNo
STU_LNAME STU_FNAME
STU_PHONE
UNIT-II Chapter –I Entity Relationship modeling
Q: What are E-R Model Components or modules?
Ans:Three components: Entities, Attributes, and Relationships.
Entity: Entity is anything about which data are to be collected and
stored.
An entity may be concrete (a person or a book, for example) or abstract (like a holiday or a
concept).
An entity is represented by a rectangle containing entity’s name.
The entity name , a noun, is usually written in all capital letters.
Attribute:
Attributes are characteristics of entities.
For ex: STUDENT entity has the attributes
STU_FNAME,STU_PHONE etc.
Attributes are represented by ovals and are connected to the
entity rectangle with a line.
Each oval contains the name of the attribute it represents.
Attributes may share a domain.
Primary keys are underlined. (here RollNo is the primary key.)
Relationship
A relationship is an association between entities.
Relationships are described as verbs.
Relationships are represented by diamond-shaped symbols
Q: What are the different Types Of Attributes:
Ans:
1. Required and Optional Attributes:
Required attribute is an attribute that must have a value, it cannot be left empty.
Ex: STU_FNAME, STU_FNAME
Optional attribute is an attribute that does not require a value, it can be left empty.
Ex: STU_PHONE…all students may or may not have a phone at home.
2.Composite and Simple attributes:
A simple attribute cannot be subdivided.
Examples: Age, Sex, and Marital status
A composite attribute can be further subdivided to yield additional attributes.
Examples:ADDRESS into Street, City, State, Zip
PHONE NUMBER into Area code, Exchange number
3. Single-Valued and Multivalued Attributes:
A single-valued attribute can have only a single value.
Examples: A manufactured part can have only one serial number.
A multivalued attribute can have many values.
Multivalued attributes are shown by a double line connecting to the entity
Examples: i) A person may have several college degrees.
ii)A household may have several phones with different numbers
4. Derived Attribute and Stored Attribute
A derived attribute is not physically stored within the database; its value is computed from
other attributes.
It is indicated using a dotted line connecting the attribute with the entity.
Example: AGE can be derived from DOB and current date.
Page 20
What is Cardinality ?
Ans: Cardinality expresses the minimum and maximum number of entity occurrences
associated with one occurrence of the related entity.
In the ERD, cardinality is indicated by placing appropriate numbers beside the entities, using
the format (x,y).
The 1st
value represents the minimum number of associated entities,
while the 2nd
value represents the maximum number of associated entities.
These implemented by the application software or by triggers.
Q:When can you say an entity is Existence dependent/ independent?
Ans: An entity is said to be existence dependent if it can exist in the database only when it is
associated with another related entity occurrence.
Existence independence: if an entity can exist independently, then it is said to be existence
dependent.
Q:What is relationship strength? Explain about strong and weak relationships.
Ans: Relationship Strength is based on how the primary key of a related entity is defined.
They are of 2 types.
Weak (Non-identifying) relationship: a weak relationship also known as non-identifying
relationship, exists if the entity has a primary key that is not partially or totally derived from
the parent entity in the relationship
Strong relationship also known as identifying relationship, exists if the entity has a primary
key that is partially or totally derived from the parent entity in the relationship
What is Weak entity?
Ans: A weak entity is one that meets two conditions
1. The entity is existence- dependent.
2. The entity has a primary key that is partially or totally derived from the parent entity
in the relationship i.e, Strong relationship.
A weak entity id identified by using a double-walled entity rectangle.
Ex: DEPENDENT is the weak entity in the relationship EMPLOYEE has DEPENDENT.
What is meant by relationship participation?
Ans: Participation in an entity relationship is either optional or mandatory.
Optional participation means that one entity occurrence does not require a corresponding
entity occurrence in a particular relationship.
For Ex: in the “COURSE generates CLASS” relationship, there are some courses that do not
generate a class. Therefore, the CLASS entity is considered to be optional to the COURSE
entity.
Mandatory participation means that one entity occurrence require a corresponding entity
occurrence in a particular relationship
If every COURSE must generate a CLASS then the CLASS entity is considered to be
mandatory to the COURSE entity.
Types of Relationships
A relationship’s degree indicates the number of entities that participate in the relationship.
Different types of relationship degrees are :
1. Unary relationship : If an relationship is maintained within a single entity then such
relationship is called unary relationship.
Example: an employee within the EMPLOYEE entity is the manager for one or more
employees within that entity.
Page 21
when an entity has a relationship with itself then such relationship is called as recursive
relationship.
2. Binary Relationship: Binary Relationship exists when two entities are associated in a
relationship. Ex: the relationship “a PROFESSOR teaches one or more CLASSes”
What is a recursive relationship?
Ans: when an entity has a relationship with itself then such relationship is called as recursive
relationship.
What is an associative or composite or bridge entity?
Ans: When there is M:N relationship between two entities then we create a new entity called
bridge/composite entity that contains the primary keys of both the entities participating in
M:N relationship
Ex:
Explain database design challenges?
Ans:
1. Design Standards: Standards guide one in developing logical structures that reduce
data redundancies. Without design standards, it is not possible to design a proper
design or evaluate an existing design.
2. Processing Speed: high processing speed are top priority in database design as high
processing speed are necessary for many organizations
for example: a perfect design must use a 1:1 relationship to avoid nulls, while a higher
transaction –speed design might combine the two tables to avoid the use of an
additional relationship, using dummy entries to avoid nulls.
If the focus is on data-retrieval speed, one must include derived attributes in design.
3. Information requirements: a design that meets all logical requirements is an
important goal. The designer should consider end-user requirements such as
performance, security, shared access. He must also verify that all update, retrieval and
deletion options are available and also all query and reporting requirements.
Page 22
UNIT-III
Chapter-I Introduction to SQL
Q:What is SQL and What does SQL do?
SQL stands for structured query language.
SQL is non-procedural language, therefore you specify what is to be done rather than how is
it done.
American National Standards Institute (ANSI) prescribed a standard SQL.
SQL functions fits into two broad categories:
• It is a data definition language:(DDL):-SQL can create databse objects such as
tables,indexes and views.SQL can also define access rights to these database objects.
• It is a data manipulation language(DML):-SQL can be used to insert,update,delete
and retrieve data from the database
SQL is easy to learn
SQL can retrieve data from database
SQL can execute queries
SQL queries are used to answer question and also to perform actions such as adding,deleting
table rows.
Q:Explain various datatypes available in SQL?
AnsThe following table shows some common SQL datatypes
Datatype Format Comments
Numeric NUMBER(L,D) Ex: NUMBER(7,2) indicates number will be stored with two
decimal places and may be upto 7 digits long,including the
sign and decimal places.
INTEGER (OR)
INT Cannot be used if you want to store numbers that require
decimal places.
SMALLINT Limited to integer values upto six digits
DECIMAL(L,D) Greater lengths are acceptable, but smaller ones are not.
Character CHAR(L) Fixed length character data for upto 255 characters. If you
store strings that are not as long as the CHAR parameter
value,the remaining spaces are left unused
VARCHAR(L)
OR
VARCHAR2(L) Variable length character data will not leave unused spaces.
Date DATE
Stores dates in the julian date format.
Q: Explain how to create table using SQL?
Ans: The CREATE table is used to create a new table in the user database schema.
Syntax:
CREATE TABLE tablename (
Column1 datatype(column width) [constraints],
Column2 datatype(column width) [constraints],
……………
);
Page 23
Example:
CREATE TABLE VENDOR(
vno number(3) PRIMARY KEY,
vname varchar2(35) NOT NULL,
vcity varchar2(15));
If the above command is executed successfully, the message “table created “ is displayed.
The following are the rules for naming a table.
1. Table names should start with an alphabet
2. Underscores,numbers and letters are allowed but not blank spaces.
3. Maximun length of table name is 30 characters.
4. Reserved words of ORACLE cannot be used as table name.
5. Two different table should not have the same name.
6. Unique column names should be specified.
7. Proper data types and size should be specified.
Q: What are SQL constraints? Explain?
Ans: Entity integrity is enforced automatically when the primary key is specified in CREATE
TABLE command.
For Ex:
CREATE TABLE PRODUCT(
pno char(3),
pdesc varchar2(35) NOT NULL UNIQUE,
p_indate date,
qoh number(5),
price number(5),
vno number(3),
PRIMARY KEY(pno),
FOREIGN KEY(vno) REFERENCES VENDOR ON UPDATE CASCADE);
The primary key attribute contains both a NOT NULL and a UNIQUE specification.
The foreign key constraint definition ensures that
• You cannot delete a vendor from VENDOR table if atleast one PRODUCT row
references that VENDOR.
• ON UPDATE CASCADE (not supported by ORACLE) ensures that when a change is
made in VENDOR table, that change will be reflected automatically in PRODUCT
table.
Besides the primary key and foreign key constraints, the ANSI SQL standard defines the
following constraints.
• NOT NULL ensures that a column will not have null values.
• UNIQUE ensures that a column will not have duplicate values.
• DEFAULT defines a default value for a column(when no value is given).
• CHECK validates data in an attribute and sees that a specified condition exists.
Ex1: The minimum order must be atleast 10
Ex2:The date must be after APRIL 15, 2011
The CREATE TABLE command lets you define constraints in two different places.
• When you create the column definition (known as column constraint)
• When you use CONSTRAINT keyword (known as table constraint)
A column constraint applies to just one column.
A table constraint may apply to many columns.
Page 24
Q:Explain important data manipulation commands (DML) of SQL?
Ans:
INSERT: Used to enter data into a table.
Syntax:
INSERT INTO tablename VALUES (value1,value2,…..valuen)
Example:
INSERT INTO VENDOR VALUES (100,’RADHA’,’VJA’);
Observe that:
Character and date values must be entered between apostrophes(‘).
Numerical entries are not enclosed in apostrophes(‘).
Attribute entries are separated by commas.
Inserting Rows with NULL attribute
INSERT INTO product VALUES (‘P02’,’PENCIL’,’02-AUG-2011’, 25, 3, NULL);
Note that the NULL entry is accepted only because the vno attribute is optional in
PRODUCT table.
The NOT NULL declaration is not used in the CREATE TAVLE statement for these
attributes.
Inserting Rows with OPTIONAL attributes:
If the data is not available for all columns, then column list must be included following table
name.
INSERT INTO product(pno,pdesc) VALUES(‘P03’,’MOUSE’)
COPYING PARTS OF A TABLE
To create a new table based on selected column and rows of an existing table. In this
case, the new table will copy the attribute names,data characteristics and rows of
original table.
CREATE TABLE part AS
SELECT pno,pdesc,vno FROM product;
Note that no entity integrity(primary key) or referential integrity (foreign key) rules are
automatically applied to the new table.
Saving the table changes or COMMIT:
The COMMIT command permanently saves all changes such as rows added, attributes
modified and rows deleted made to any table in the database.
Syntax:
COMMIT;
Any changes made to table contents are not saved on disk until you close the database, close
the program you are using, or use the COMMIT command.
UPDATE Command:
The UPDATE command modifies an attribute value in one or more table rows.
Allows you to make data entries in an existing row’s columns.
Syntax:
UPDATE tablename
SET columnname = expression [,columnname = expression]
WHERE conditionlist;
Ex:To change the p_indate of product with pno P01 to 02-AUG-2011.
UPDATE PRODUCT
SET p_indate =’02-AUG-2011’
Page 25
WHERE pno=’P01’;
Restoring table contents or ROLLBACK:
ROLLBACK-undoes any changes since the last COMMIT command and brings the data
back to the values that existed before the changes were made.
Syntax: ROLLBACK;
Ex:
1. Create table called sales.
2. Insert 10 rows in sales table.
3. Execute the ROLLBACK command
ROLLBACK will undo only the result of INSERT and UPDATE commands.
All data definition commands(CREATE TABLE) are automatically committed to data
dictionary and cannot be rolled back.
DELETE Command
DELETE -deletes one or more rows from a table
If you do not specify a WHERE condition , all rows from table will be deleted.
REMOVAL OF SPECIFIED ROW(S):
Syntax: DELETE FROM tablename [WHERE conditionlist];
REMOVAL OF ALL ROWS:
Syntax: DELETE FROM tablename;
Viewing data in tables or SELECT
SELECT-lists the contents of a table.
Syntax:
SELECT columnlist
FROM tablename
[WHERE conditionlist];
The columnlist represents one or more attributes separated by commas.
You can use the * wildcard character to list all attributes.
Ex1: SELECT * FROM PRODUCT;
Ex2: SELECT pdesc,p_indate FROM product WHERE pno=’P01’;
Ex3:SELECT * FROM product WHERE p_indate>’01-AUG-2011’;
The SELECT statement retrieves all rows that match the specified condition.
WHERE clause adds conditional restrictions to SELECT statement.
The condition list is represented by one or more conditional expressions separated by logical
operators.
Comparison operators can be used to restrict output.
Comparison operators:
Symbol Meaning Example
= Equal to SELECT * FROM product WHERE pno=’P01’;
< Less than SELECT * FROM product WHERE price<10;
<= Less than or equal to
> Greater than SELECT * FROM product WHERE price>10;
>= Greater than or equal
to
<> or !
=
Not equal to SELECT * FROM product WHERE vno <> 100;
Using Computed Columns
Page 26
Oracle uses actual formula text as the label for the computed
column.
Ex: SELECT pno,qoh*price FROM PRODUCT;
Result:
Using Column aliases
An alias is an alternative name given to a column or table in any SQL
statement.
Ex2: SELECT pno,qoh*price AS total FROM PRODUCT;
Using Date arithmetic
SYSDATE is a special function that returns today’s date.
Ex:1 SELECT pno,p_indate,p_indate+90 AS ExpiryDate FROM product;
Ex:2 SELECT pno,p_indate,SYSDATE-90 AS CutDate FROM product
WHERE p_indate<=SYSDATE-90
The output would change based on today’s date
Arithmetic Operators:
Symbol Meaning Example
+ Addition
- Subtraction
* Multiply SELECT qoh, price*qoh FROM
product;
/ Division
^ Raised to power(some applications uses **
instead of ^)
Rules of Precedence: Perform operations within parentheses then perform ^ then *,/ then +,-
Logical Operators:
SQL allows you to have multiple conditions in a query through the use of logical operators.
Symbol Meaning Example
AND Both conditions must match SELECT * FROM product
WHERE price > 10 AND price < 100;
OR Either condition must match SELECT * FROM product
WHERE vno = 100 OR vno = 101
NOT Do not match a certain
condition
SELECT * FROM product
WHERE NOT(vno = 100)
Display the result when all the condition specified using the AND operator are satisfied
Display the result when Either of the condition specified using the OR operator are satisfied
NOT operator is used to find rows that do not match a certain condition. It negates the result
of conditional expression
Ex: SELECT * FROM product WHERE ( price < 50 AND p_indate > ’01-AUG-2011’) OR
vno = 100;
The rows of vno=100 are included regardless of p_indate and price of those rows.
Special Operators
BETWEEN operator:
Used to check whether an attribute value is within a range
Ex: To see list of products whose price is between $10 and $100, use the command:
Page 27
pno qoh*price
P01 PEN
P02 PENCIL
pno total
P01 PEN
P02 PENCIL
SELECT * FROM product WHERE price BETWEEN 10 AND 100;
IS NULL operator:
Used to check whether an attribute value is null.
Ex: To list all the products that do not have a vendor assigned, use the command:
SELECT * FROM product WHERE vno IS NULL;
LIKE operator:Used only with char and varchar2.
Matches a string pattern.
Used in conjuction with wildcards to find patterns within string attributes.
Ex1: To find all vendors whose name start with R
SELECT * FROM vendor WHERE vname LIKE ‘R%’;
To find all vendors whose name has ‘a’ as second letter.
Ex2: SELECT * FROM vendor WHERE vname LIKE ‘_a%’;
SQL allows you to use the percent sign (%) and underscore( _ ) wild card characters to make
matches when the entire string is not known.
Wildcard Meaning
% Matches any characters
_ Matches one characters
Matches can be made when the query entry is written exactly like table entry.
IN operator: matches any value within a VALUE list.
uses an equality operator i.e, it selects only those rows that match(are equal to) atleast
one of the values in the list
Ex:
SELECT * FROM product
WHERE vno IN(100 , 101);
All of the values in the list must be of same data type.
Each of the values in the value list is compared to the attribute.
IN operator is valuable when it is used in subqueries.
SELECT * FROM vendor
WHERE vno IN(SELECT vno FROM product );
Subquery (SELECT vno FROM product) will list all vendors who supply products.
IN operator will compare the values generated by subquery to vno values in VENDOR table.
EXISTS operator:checks whether subquery returns any row.
If subquery returns any row, run the main query otherwise don’t.
Ex:
SELECT * FROM vendor
WHERE (SELECT * FROM product WHERE qoh<=10);
Modifying structure of table:
ALTER Command: All changes to table structure are made using the ALTER command.
Syntax:
ALTER TABLE tablename
{ADD|MODIFY} (columnname datatype [{ADD|MODIFY} columnname datatype]);
To Change column’s datatype
Page 28
To change the vname datatype from varchar2 to char
ALTER TABLE vendor MODIFY (vname char(35));
To Change column’s data characteristics
To increase the width of vname column to 55 characters
ALTER TABLE vendor MODIFY (vname char(35));
To add a column
ALTER TABLE product ADD (pmin number(5));
If the table already has some data , we cannot add new column with NOT NULL as existing
rows will default to NULL for the new column.
TO ADD TABLE CONSTRAINTS:
Syntax: ALTER TABLE tablename ADD constraint [ADD constraint];
To add primary key:
ALTER TABLE part ADD PRIMARY KEY(part_no);
To add foreign key:
ALTER TABLE part ADD FOREIGN KEY(vno) REFERENCES vendor;
(OR)
ALTER TABLE part ADD PRIMARY KEY(part_no)
ADD FOREIGN KEY(vno) REFERENCES vendor;
To add primary and foreign key using the keyword CONSTRAINT:
ALTER TABLE part ADD CONSTRAINT pk_partno PRIMARY KEY(part_no)
ADD CONSTRAINT fk_vno FOREIGN KEY(vno) REFERENCES vendor;
TO REMOVE A COLUMN OR TABLE CONSTRAINT
Synax: ALTER TABLE tablename
DROP{ PRIMARY KEY | COLUMN columnname | CONSTRAINT constraintname};
Dropping a column: deleting a column
ALTER TABLE product DROP COLUMN pmin;
DELETING A TABLE FROM DATABASE:
A table can be deleted from the database using the DROP TABLE command.
Syntax:
DROP TABLE part;
Advanced select queries
ORDER BY clause: Orders the selected rows based on one or more attributes
• Used in the last portion of select statement
• By using this, rows can be sorted
• By default it takes ascending order
• DESC is used for sorting in descending order
• Sorting by column which is not in select list is possible.
• Sorting by column aliases
Example: To produce a list of products sorted in descending order of their prices.
SELECT pno,pdesc,p_indate,price
FROM product
Page 29
ORDER BY price DESC;
A multilevel ordered sequence is known as cascading order sequence and it can be created
easily by listing several attributes, separated by commas, after the ORDER BY clause.
SELECT * FROM employee ORDER BY e_lname,e_fname,e_initial;
DISTINCT clause: Used to eliminate duplicate rows.
Ex:How many different vendors are currently represented in the PRODUCT table?
SELECT DISTINCT vno FROM product;
Explain Aggregate functions?
Ans: Some of the aggregate functions are COUNT,MIN,MAX,AVG.
COUNT: Uses one parameter within parantheses.
COUNT(columnname)-Used to count the number of non-null values of an attribute
COUNT(*) aggregate function is used to count number of rows returned by query, including
the rows that contain nulls.
Ex1: How many rows in PRODUCT table have a price value less than or equal to $500?
SELECT COUNT(*) FROM product WHERE price<=500;
Ex:2:How many vendors referenced in the PRODUCT table have supplied products with
prices that are <+1?
SELECT COUNT(DISTINCT vno) FROM product WHERE price<=10;
MAX and MIN
Ex1: Which product has highest price?
SELECT * FROM product WHERE price = (SELECT MAX(price) FROM product);
(Here we cannot use SELECT * FROM product WHERE price = MAX(price); because The
aggregate functions can be used only in column list of a SELECT statement)
Ex2:Highest price in PRODUCT table?
SELECT MAX(price) FROM product;
Ex:3Lowest price in PRODUCT table?
SELECT MIN(price) FROM product;
Ex4: To find out the product that has the oldest date?
SELECT * FROM product WHERE price = (SELECT MIN(p_indate) FROM product);
Ex5: To find out the most recent product.
SELECT * FROM product WHERE price = (SELECT MAX(p_indate) FROM product);
SUM: Computes total sum of any specified attribute.
Ex:To find the total value of all items
SELECT SUM(qoh*price) AS TOTALVALUE
FROM product;
AVG
Ex1: To find the products whose prices exceed the average product price.
SELECT * FROM product
WHERE price > (SELECT AVG(price) FROM product)
ORDER BY price desc;
Explain about GROUP BY clause?
• Used to group rows on basis of certain common attribute value such as employees of
a department, products of a vendor.
Page 30
• WHERE clause can be used ,if needed.
• The only attributes that can be put in select clause are the aggregated functions and
the attributes that have been used for grouping the information.
Ex1:How many products are supplied by each vendor?
SELECT vno, COUNT(pno)
FROM product
GROUP BY vno;
Having clause:
Extension of the GROUP BY feature is the HAVING clause.
HAVING clause is applied to the output of GROUP BY operation.
Ex: how many products supplied by each vendor.List only the products whose average is
below $10
SELECT vno, COUNT(pno), AVG(price)
FROM product
GROUP BY vno
HAVING AVG(price) < 10;
Q: Explain about index in SQL
Ans:
Indexes are used to quickly access the data.
Syntax: CREATE INDEX <index name> ON <tablename>(column name);
An index can be created on one or more columns.
Based on the number of columns included in index, an index can be of 2 types.
1. Simple index 2.Composite Index.
To create Simple index:
An index created on a single column is called simple index.
Ex: CREATE INDEX p_in ON product(p_indate)
To create composite index:
An index created on a more than one column is called composite index.
Dropping indexes or deleting an index: Use the DROP INDEX command.
Ex: DROP INDEX p_in;
Q:What is database schema?
Ans:A schema is a group of database objects such as tables and indexes, that are related to
each other. Syntax: CREATE SCHEMAAUTHORIZATION {creator}
When a user is created, the DBMS automatically assigns schema to that user.
Schemas are useful to group tables by owner and enforce a first level of security by allowing
each user to see only the tables that belong to that user.
Labwork:
CREATE TABLE VENDOR(
Page 31
VENDOR
vn
o
vname vcity
10
0
RADHA VJA
10
1
ALIYA NULL
10
2
SIRI VJA
10
3
LAK GNT
vno number(3) PRIMARY KEY,
vname varchar2(35) NOT NULL,
vcity varchar2(15));
CREATE TABLE PRODUCT(
pno char(3),
pdesc varchar2(35) NOT NULL UNIQUE,
qoh number(5),
price number(5),
vno number(3),
PRIMARY KEY(pno),
FOREIGN KEY(vno) REFERENCES VENDOR);
CREATE TABLE CUSTOMER(
cno number(3) PRIMARY KEY,
cname varchar2(35) ,
city varchar2(5),
baldue number(5));
CREATE TABLE INVOICE(
invno number(3),
cno number(3),
invdate date,
PRIMARY KEY(invno),
FOREIGN KEY(cno) REFERENCES CUSTOMER);
Page 32
pno pdesc qoh price vno
P01 PEN 2 10 100
P02 CD 20 12 101
P03 PENCIL 200 3 NULL
P04 DVD 200
0
350 101
CUSTOMER
cno cname city baldue
20
1
ANU VJA 100
20
2
ASHA GNT 500
20
3
RAJ VJA
INVOICE
invno cn
o
invdate
301 20
1
20-AUG-2011
302 20
2
20-AUG-2011
303 20
3
21-AUG-2011
304 20
1
21-AUG-2011
All products sold are stored in LINE table
CREATE TABLE LINE(
invno number(3),
lineno char(3),
pno char(3),
line_units number(5),
line_price number(5),
PRIMARY KEY(invno,lineno),
FOREIGN KEY(pno) REFERENCES PRODUCT
FOREIGN KEY(invno) REFERENCES INVOICE);
CREATE TABLE EMP(
e_lname varchar2(20),
e_fname varchar2(20),
e_initial varchar2(2),
dob date,
sal number(8,2));
Page 33
LINE
invno lineno pno line_units line_price
301 L01 P01 10 10
301 L02 P02 10 12
301 L03 P03 20 3
302 L01 P01 30 10
302 L02 P02 20 12
303 L01 P01 35 10
303 L02 P02 15 12
EMP
e_lname e_fname e_initial dob Sal
REDDY SAM A 14-NOV-
1994
15000.25
NAIDU ANU S 14-OCT-1992 16234.50
JAIN NEHA K 28-NOV-
1993
15623.48
REDDY RAM T 14-SEP-1994 1623.89
Unit –III Chapter –II ADVANCED SQL
SQL data manipulation commands operate over entire table (ex: SELECT command lists all
rows from the table you specified in FROM clause) and are said to be set oriented
commands.
UNION statement:combines rows from two or more queries without including duplicate
rows.
Syntax: query UNION query
Query: SELECT cname,city FROM customer UNION SELECT cname,city FROM
customer3
Combines ouput of two or more SELECT queries. (The select statements must be union –
compatible.that is they must return the same attribute names and similar data types) without
including duplicate rows.
UNION ALL Combines ouput of two or more SELECT queries. (The select
statements must be union – compatible.that is they must return the same
attribute names and similar data types) and retains duplicate rows
SELECT cname,city FROM customer UNION ALL SELECT cname,city FROM customer3
INTERSECT statement: used to combine rows from two queries ,
returning only the rows that appear in both sets.
SELECT cname,city FROM customer INTERSECT SELECT cname,city FROM customer3
Page 34
CUSTOMER3
cno cname city baldue
40
1
JAY GNT 200
40
2
RAJ VJA 300
CUSTOMER
cname city
ANU VJA
ASHA GNT
RAJ VJA
JAY GNT
CUSTOMER
cno cname city baldue
20
1
ANU VJA 100
20
2
ASHA GNT 500
20
3
RAJ VJA
CUSTOMER
cname city
ANU VJA
ASHA GNT
RAJ VJA
JAY GNT
RAJ VJA
CUSTOMER
cname city
RAJ VJA
MINUS statement: combines rows from two queries and returns only
rows that appear in first set but not in the second.
SELECT cname,city FROM customer MINUS SELECT cname,city FROM customer3
SQL JOIN OPERATORS:
A join is used to combine rows from multiple tables and returns the rows with one of the
following conditions:
Join operations can be classified as inner joins and outer joins.
The inner join is traditional join in which only rows that meet a given criteria are
selected.
The join criteria can be an equality condition (also called a natural join or an equijoin) or
inequality condition( also called theta join)
Generally a join condition will be equality comparison of the P.K of one table and F.K of
related table
An outer join returns not only matching rows but also unmatched attribute values from
one table or both tables to be joined.
Join
specification
Join Type SQL
Syntax Example
Description
CROSS CROSS
JOIN
SELECT * FROM
T1,T2
SELECT *
FROM T1 CROSS
JOIN T2;
Returns the Cartesian product of T1 and
T2(old style)
Returns the Cartesian product of T1 and
T2(old style)
INNER Old-Style
JOIN
SELECT * FROM
T1,T2
WHERE
T1.C1=T2.C1;
Returns only the rows that meet the join
condition in the WHERE clause.
NATURAL
JOIN
SELECT *
FROM T1
NATURAL JOIN T2;
Returns only the rows with matching
values in the matching columns.The
matching columns must have the same
names and similar datatypes.
JOIN
USING
SELECT *
FROM T1 JOIN T2
USING(C1)
Returns only the rows with matching
values in the columns indicated in the
USING clause
JOIN ON
SELECT *
FROM T1 JOIN T2
ON T1.C1=T2.C1;
Returns only the rows that meet the join
condition in the ON clause
OUTER LEFT JOIN SELECT *
FROM T1 LEFT
OUTER JOIN T2
ON T1.C1=T2.C1;
Returns rows with matching values and
include all rows from left table(T1) with
unmatched values
RIGHT
JOIN
SELECT *
FROM T1 RIGHT
OUTER JOIN T2
ON T1.C1=T2.C1;
Returns rows with matching values and
include all rows from right table(T2) with
unmatched values
Page 35
CUSTOMER
cno cname city
20
1
ANU VJA
20
2
ASHA GNT
FULL
JOIN
SELECT *
FROM T1 FULL
OUTER JOIN T2
ON T1.C1=T2.C1;
Returns rows with matching values and
include all rows from both table(T1 and
T2) with unmatched values
RECURSIVE JOIN (OR) SELF JOIN:
An alias is an alternative name given to a column or table in any SQL statement.
An alias is especially useful when a table must be joined to itself in a recursive query
Ex:
SELECT E.Eno,E.Ename,M.Ename
FROM EMP E,EMP M
WHERE E.Mgr=E.Eno;
Cross Join:(also known as cartesian product) Examples:
SELECT * FROM invoice CROSS JOIN line;
The above query generates 4*7=28rows ( 4 rows in invoice table and 7 rows in line table)
Natural Join:
SELECT cno,cname,invno,invdate FROM customer NATURAL JOIN invoice;
You are not limited to two tables when performing a natural join.
It doesnot require a table qualifier for the common attribute.
SELECT invno,pno,pdesc,line_units,line_price
FROM invoice NATURAL JOIN line NATURAL JOIN product;
JOIN USING clause
It doesnot require a table qualifier for the common attribute.
SELECT invno,pno,pdesc,line_units,line_price
FROM invoice JOIN line USING(invno) JOIN product USING(pno);
JOIN ON clause
Do not require common attribute names in the joining tables.
Requires a table qualifier for the common attribute.
Lets you perform a join even when the tables do not share a common attribute name.
SELECT invoice.invno,pno,pdesc,line_units,line_price
FROM invoice JOIN line ON invoice.invno=line.invno
JOIN product ON line.pno=product.pno;
OUTER JOINS
SELECT pno,vendor.vno,vname FROM vendor LEFT JOIN product ON
vendor.vno=product.pno;
SELECT pno,vendor.vno,vname FROM vendor RIGHT JOIN product ON
vendor.vno=product.pno;
SELECT pno,vendor.vno,vname FROM vendor FULL JOIN product ON
vendor.vno=product.pno;
SUBQUERIES: used when it is required to process data based on other processed data
Characteristics of sub queries:
A subquery or nested query or inner query is a query inside another query.
A subquery is normally expressed inside parentheses
The output of inner query is used as input for the outer(high-level) query.
So inner query is executed first and then the outer query.
Subquery is based on the use of the SELECT statement to return one or more values to
another query. If the table into which you are inserting rows has one date attribute and one
Page 36
number attribute, the SELECT subquery should return rows in which 1st
column has date
values and 2nd
column has number values.
Inserting table rows with a select subquery or Copying parts of tables:
It add multiple rows to a table, using another table as source of the data.
CREATE TABLE PART(
part_no char(3) PRIMARY KEY,
part_desc varchar2(35),
vno number(3));
Syntax:
INSERT INTO target_tablename SELECT source_columnlist FROM source_tablename;
Example: INSERT INTO part SELECT * FROM product;
Both the tables(PART and PRODUCT) must have same attributes.The above query returns all
rows from table PRODUCT.
SELECT subquery Examples Explanation
UPDATE product
SET price=(SELECT AVG(price) FROM product )
WHERE vno=’100’………………Ex(2)
Updates the product price to
average product price for the
products provided by vendor 100.
DELETE FROM product WHERE vno IN(SELECT
vno FROM vendor WHERE vcity=’VJA’)
……………..Ex(3)
Delete the PRODUCT table rows
that are provided by vendors with
vcity=’VJA’
A subquery can return
i. One value as in Ex(2) ( the select subquery returns avg(price) which is one value).
ii. A list of values as in Ex(3) (the select subquery returns a list of vendors from ‘VJA’)
iii. A virtual table
iv. No value at all i.e, NULL . the output of the outer query might result in an error or a
null empty set.
WHERE subqueries
Ans:Ex: Find all products with a price greater than or equal to the average product price, you
write the following query.
SELECT pno,price FROM product
WHERE price>=(SELECT AVG(price) FROM product);
Note that this type of query,when used in a >,<,==,>= or <= conditional expression, requires
a subquery that returns only one single value.If the query returns more than a single value, the
DBMS will generate an error.
IN subqueries: Ans:Ex(2)
HAVING subqueries
Example:To list all products with the total quantity sold greater than the average quantity sold
SELECT pno,SUM(line_units) FROM line
GROUP BY pno HAVING SUM(line_units)>(SELECT AVG(line_units) FROM line);
MULTIROW subquery operators: ANY and ALL
1. ALL:used to do an inequality comparison(> or <) of one value to a list of values.
Example: What products have a product cost that is greater than all individual product costs
for products provided by vendor with vno 101
SELECT pno, qoh*price FROM product
WHERE qoh*price> ALL(SELECT qoh*price FROM product WHERE vno = 101);
Page 37
In the above query the ALL operator allows you to compare a single value(qoh*price) with a
list of values returned by the subquery.
2. ANY: ANY operator allows you to compare a single value with a list of values,
selecting only the
rows whose qoh*price is greater than any value of the list.
FROM subqueries
FROM clause specifies the table from which data will be drawn.
Example:To find all customers who purchased both products ‘PEN’ and ‘PENCIL’
SELECT DISTINCT cno, cname FROM customer,
( SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PEN’) cp1,
(SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PENCIL’)
cp2
WHERE customer.cno = cp1.cno AND cp1.cno=cp2.cno;
(OR)
CREATE VIEW cp1 AS
SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PEN’;
CREATE VIEW cp2 AS
SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PENCIL’;
SELECT DISTINCT cno, cname FROM customer NATURAL JOIN cp1 NATURAL JOIN
cp2;
Attribute List Subqueries or inline subquery.
The attribute list can also include a subquery expression also known as inline subquery.
The inline subquery must return one single value otherwise an error code is raised.
SELECT pno, price ,(SELECT AVG(price) FROM product) AS AVGPRICE ,
price - (SELECT AVG(price) FROM product) AS DIFF
FROM product;
The query used the full expression instead of column aliases when computing DIFF.
The column aliases cannot be used in computations in the attribute list when the alias is
defined in the same attribute list.
We can use Attribute List Subqueries to include data from other tables that are not
directly related to main table or tables in the query.
SELECT pno,SUM(line_units*line_price) AS sales,
(SELECT COUNT(*) FROM employee) AS ecount,
SUM(line_units*line_price)/ (SELECT COUNT(*) FROM employee) AS contib
FROM line
GROUP BY pno;
CORRELATED SUBQUERIES
To produce correlated subquery the DBMS does
i. It iniates the outer query
ii. For each row of the outer query result set, it executes the inner query by passing
the outer row to the inner query.( inner query references a column of the outer
subquery)
Example:To find all product sales who units sold > average units sold for that product.
SELECT invno, pno, line_units FROM line LS
WHERE LS.line_units>(SELECT AVG(line_units) FROM line LA WHERE LA.pno =
LS.pno);
Page 38
The inner query runs once using the first pno found in the outer line table and returns average
sale for that product.
CORRELATED subqueries can also be used with EXISTS special operator
Example: To know the vendor code and name for the products having qoh<10
SELECT vno, vname FROM vendor
WHERE EXISTS(SELECT * FROM product WHERE qoh<10 AND
vendor.vno=product.vno);
SQL functions: Used to generate information from data.
DUAL: is Oracle pseudo table used only for cases when a table is not really needed.
DATE/TIME FUNCTIONS
Function Examples
LAST_DAY:
returns the last day of the month based
on a date value.
Syntax:
last_day( date_value )
SELECT last_day(to_date('2003/03/15',
'yyyy/mm/dd')) FROM DUAL; would return Mar 31,
2003
SELECT last_day(to_date('2003/02/03',
'yyyy/mm/dd')) FROM DUAL; would return Feb 28,
2003
List employees born in the last seven days of a month
SELECT * FROM emp WHERE dob >=
LAST_DAY(dob)-7;
TO_CHAR function :
converts a number or date to a string.
Syntax: TO_CHAR(date_value,fmt)
fmt = format used can be
MONTH Name of month
MON:three-letter month name
MM-two digit month name
D Day of week (1-7).
DAY Name of day.
DD Day of month (1-31).
YYYY 4-digit year
YY: two digit year value
SELECT to_char(sysdate, 'yyyy/mm/dd') FROM DUAL;
would return '2003/07/09'
List all employees born in 1994.
SELECT * FROM emp WHERE
TO_CHAR(dob,’YYYY’)=’1994’;
List all employees born in the month of NOVEMBER
SELECT * FROM emp WHERE
TO_CHAR(dob,’MM’)=’11’;
List all employees born on 14th
of a month
SELECT * FROM emp WHERE
TO_CHAR(dob,’DD’)=’14’;
TO_DATE function:converts a string
to a date. Also used to translate a date
between formats.
Syntax: TO_DATE(char_value,fmt)
fmt = format used can be as above
SELECT to_date('2003/07/09', 'yyyy/mm/dd') FROM
DUAL;
would return a date value of July 9, 2003.
Find the age of employess as on 12-31-2012
SELECT
e_lname,TO_DATE(’12/31/2012’,’MM/DD/YYYY’)-
dob/365 AS YEARS FROM emp;
NOTE: ‘12/31/2012’ is a text string, not a date,
TO_DATE translates the text string to a valid oracle
date used in date arithmetic.
How many days are between 6/25/2011 and
10/27/2011
SELECT
TO_DATE(’2011/06/25’,’YYYY/MM/DD’)-
TO_DATE(‘OCTOBER 27,2011’,’MONTH
Page 39
,DD,YYYY’) FROM DUAL;
SYSDATE : returns todays date SELECT TO_DATE(’25-DEC-2011’,’DD-MON-
YYYY’)-SYSDATE FROM dual;
ADD_MONTHS: adds months to a
date.
Syntax:
add_months( date_value, n )
date_value is the starting date (before
the n months have been added).
n is the number of months to add to
date1.
SELECT add_months('01-Aug-03', 3) FROM DUAL;
would return '01-Nov-03'
SELECT pno,p_indate,ADD_MONTHS(p_indate,24)
FROM product
NUMERIC FUNCTIONS
Aggregate functions operate over a set of values(multiple rows) while numeric functions operate
over a single row.
Function Example
ABS
Returns absolute value of a number.
Syntax: ABS(numeric_value)
SELECT ABS(1.95),ABS(-1.93) FROM
DUAL;
Would return 1.95 1.93
ROUND function returns a number rounded
to a certain number of decimal places.
Syntax:ROUND(numeric_value,p)
p=precision
SELECT round(125.315) FROM DUAL;
would return 125
SELECT ROUND(sal) as sal1, ROUND(sal) as
sal2 FROM emp;
CEIL function returns the smallest integer
value that is greater than or equal to a
number.
Syntax: ceil( number )
SELECT ceil(-32.65) FROM DUAL;
would return -32.
SELECT ceil(32.65) FROM DUAL; would
return 33.
SELECT CEIL(sal) ,FLOOR(sal) FROM emp;
FLOOR function returns the largest integer
value that is equal to or less than a number.
Syntax: floor( number )
SELECT floor(5.9) FROM DUAL; would
return 5
SELECT floor(-5.9) FROM DUAL; would
return -6
The sqrt function returns the square root of
n.
Synatx: sqrt( n )
n is a positive number.
sqrt(9) would return 3
mod function returns the remainder of m
divided by n
mod(15, 4) would return 3
power function returns m raised to the nth
power.
Syntax : power( m, n )
m is the base. n is the exponent.
If m is negative, then n must be an integer.
power(3, 2) would return 9
Page 40
exp function returns e raised to the nth
power, where e = 2.71828183.
exp(3) would return 20.0855369231877
trunc function returns a number truncated to
a certain number of decimal places.
trunc(125.815, 0) would return 125
trunc(125.815, 1) would return 125.8
ln function returns the natural logarithm of a
number.
ln(20) would return 2.99573227355399
log function returns the logarithm of n base
m.
Syntax: log( m, n )
m must be a positive number, except 0 or 1.
n must be a positive number.
log(100, 1) would return 0
String Functions: are useful to concatenate strings of characters, printing names in upper case or
knowing the length of a given attribute.
Function Example
UPPER function converts all
letters in the specified string to
uppercase.
Syntax: UPPER(string)
upper('Tech on'); would return 'TECH ON
List all employee names in upper case.
SELECT UPPER (e_initial) || ‘.’|| UPPER (e_fname) ||
UPPER(e_lname) FROM EMP;
LOWER function converts all
letters in the specified string to
lowercase.
Syntax: LOWER(string)
List all employee names in lower case.
SELECT LOWER (e_initial) || ‘.’|| LOWER (e_fname) ||
LOWER(e_lname) FROM EMP;
SUBSTR function allows you to
extract a substring from a string.
Syntax:substr( string, p, l )
string is the source string.
p is the position for extraction.
l is optional. It is the number of
characters to extract.
substr('This is a test', 6, 2) would return 'is'
substr('This is a test', 6) would return 'is a test'
substr('This is a test', -3, 3) would return 'Net'
List first 3 characters of all employee last names..
Ex:SELECT SUBSTR(e_lname,1,3) AS prefix FROM EMP;
LENGTH function returns the
number of characters in the
specified string.
Syntax:
length( string)
length(NULL) would return NULL.
length('') would return NULL. 
length('Tech on the Net') would return 15.
List all employees last names and length of their last names.
SELECT e_lname, LENGTH(e_lname) FROM EMP;
Concatenation
The || operator allows you to
concatenate data from two
different character columns and
returns a single column.
Syntax: string1 || string2
'a' || 'b' || 'c' || 'd' would return 'abcd'.
List all employee names (concatenated)
SELECT e_initial || ‘.’|| e_fname || e_lname AS NAME
FROM EMP;
CONVERSION FUNCTIONS:allows you take a value of given data type and convert it to the
equivalent value in another data type.
Functions Example
TO_CHAR : returns a
character string from a
numeric value.
Syntax:
SELECT eno, TO_CHAR(sal, ‘9,999.99’) AS PRICE FROM
EMP;
Page 41
TO_CHAR(numeric_value,
fmt)
TO_NUMBER : returns a
formatted number from a
character string.
Syntxa:TO_NUMBER
(char_value, fmt)
fmt= format used can be:
9 - displays a digit
0 – displays a leading zero
, - displays the comma
. – displays the decimal point
$ - displays the dollar sign
B – leading blank
S – leading sign
MI – trailing minus sign
SELECT TO_NUMBER(‘-123.99’,’S9999.99’),
TO_NUMBER(’99.78-’,’B999.99MI’), FROM DUAL;
DECODE: compares an
attribute or expression with a
series of values and returns
an associated value or a
default value if no match is
found
Syntax: DECODE(e,x,y,d)
e – attribute or expression
x – value with which to
compare e
y – value to return in e = x
d – default value to return if e
is not equal to x
The following example returns the sales tax for specified
cities.
Compares vcity to ‘VJA’ ;if the value matches it returns .08
Compares vcity to ‘GNT’ ;if the value matches it returns .05
If there is no match it returns 0.00( the default value)
SELECT vno, vcity,
DECODE(vcity,’VJA’,.08,’GNT’,.05,0.00) AS TAX FROM
VENDOR;
Page 42
Oracle sequences: generates a numeric value that can be assigned to any column in any
table.
Use of sequences is optional, you can enter the values manually.
Oracle sequences have a name and can be used any where a value is expected.
Sequences can be created and deleted anytime.
The table attribute to which you assigned a value based on a sequence can be edited and
modified.
Oracle sequences are
• Independent objects in the database.
• Not a data type
• Not tied to a table or column
Syntax:
CREATE SEQUENCE name [START WITH n] [INCREMENT BY n] [CACHE |
NOCACHE]
where name is the name of the sequence
n is an integer that can be positive or negative.
START WITH specifies initial sequence value( the default value is 1)
INCREMENT BY determines the value by which the sequence is incremented.
The CACHE or NOCACHE indicates whether oracle will preallocate sequence numbers in
memory. (Oracle preallocates 20 values by default)
Example: CREATE SEQUENCE CSEQ1 START WITH 204 INCREMENT BY 1
NOCACHE;
To check all the sequences you have created.
SELECT * FROM USER_SEQUENCES;
To use sequences during data entry
you must use two special pseudo columns NEXTVAL and CURRVAL.
NEXTVAL retrieves the next available value from a sequence. Each time you use
NEXTVAL , the sequence is incremented.
CURRVAL retrieves the current value of sequence.
Example
INSERT INTO CUSTOMER VALUES (CSEQ1.NEXTVAL,’RAVI’,’NELLORE’, 500);
INSERT INTO INVOICE VALUES (‘I05’ , CSEQ1.CURRVAL,’22-AUG-2011’);
You cannot use CURRVAL unless a NEXTVAL was issued previously in the same session.
NEXTVAL retrieves the next available sequence number( here 204) and signs to cno in
CUSTOMER table.
CSEQ1.CURRVAL refers to last used CSEQ1.NEXTVAL sequence number(204).
In this way the relationship between INVOICE and CUSTOMER is established.
COMMIT; statement must be issued to make the changes permanent.
You can also issue a ROLLBACk statement , in which case the rows you inserted in
INVOICE and CUSTOMER will be rolled back.( but sequence number would not) That is, if
you use sequence number again you must get 204 but you will get 205 eventhough the row
204 is deleted.
DROPPING a SEQUENCE doesnot delete the values you assigned to table attributes.
Syntax: DROP SEQUENCE CSEQ1;
Page 43
VIEWS
A view is a virtual table based on SELECT query.
The tables on which view is based are called base tables.
Syntax:
CREATE VIEW viewname AS SELECT query
Characteristics:
A relational view has several special characteristics
• We can use the view instead of table in a SQL statement.
• Views are dynamically updated when the base table is updated.
• Views provide a level of security in the database. The view can restrict users to only
specified columns and specified rows in a table.
• View may also be used as the basis for reports
Example: CREATE VIEW PROD_STATS AS SELECT vno, SUM(qoh * price) AS TotalCost
FROM PRODUCT GROUP BY vno;
To drop a view
Syntax: DROP VIEW <view name>
Example:
DROP VIEW PROD_STATS
UPDATABLE VIEWS:To use batch update routines to update master table attribute with
transaction data.
To demonstrate a batch update routine, consider two tables
NOTE:There is 1:1 relationship between two tables
To Update qoh attribute (qoh – qty as that much quantity has been sold)
1. We have to join two tables
2. update qoh for each row of ProdMaster table with matching pno values in ProdSales
table.
We use a updatable view to do that.
Updatable view is a view that can be used to update attributes in the base tables that are used
in the view.
Not all views are updatable.
The most common updatable view restrictions are as follows:
1. GROUP BY and aggregate functions cannot be used.
2. Cannot use SET operators.
3. The P.K columns of base table you want to update must have unique values in the
view. That is, the two tables must have 1:1 relationship then only the view can be
used to update a base table.
Example: CREATE VIEW QUP AS ( SELECT ProdMaster.pno, qoh, qty FROM ProdMaster,
ProdSales
WHERE ProdMaster.pno=ProdSales.pno);
Page 44
ProdMaster
pno pdesc qoh
P01 SCREWS 60
P02 NUTS 37
P03 BOLTS 50
ProdSales
pno qty
P01 7
P02 3
UPDATE QUP SET qoh=qoh-qty;
Page 45
Q: What is PSM (Persistent Stored Module)?
Ans: A Persistent Stored Module is a block of code containing standard SQL statements and
procedural extensions that is stored and executed at the DBMS server. The PSM represents
business logic that can be encapsulated, stored and shared among multiple database users. A
PSM lets an administrator assign specific access rights to a stored module to ensure that only
authorized users can use it. Oracle implements PSMs through its procedural SQL language.
(PL/SQL)
Q: What is PL/SQL? Explain?
Ans: PL/SQL is a language that makes it possible to use and store procedural code and SQL
statements within the database.
It is also used to merge SQL and traditional programming constructs, such as
• Variables,
• conditional processing (IF-THEN-ELSE),
• basic loops (FOR and WHILE loops) and
• Error trapping.
The procedural code is executed as a unit by the DBMS when it is invoked by the end user.
End users can use PL/SQL to create
• Anonymous PL/SQL blocks.
• Triggers
• Stored Procedures
• PL/SQL functions
You can write PL/SQL code block by enclosing the commands inside BEGIN and END
clause.
Ex:
BEGIN
INSERT INTO vendor VALUES (105, ‘SITA’, ‘TNL’);
END;
/
This is an example of anonymous PL/SQL block because it has not given a specific name.
The above PL/SQL block executes as soon as you press ENTER key after typing /
You will see the message “PL/SQL procedure successfully completed”
If you want a more specific message such as “new vendor added”. You must type as follows:
SQL> SET SERVEROUTPUT ON
This SQL * plus command enables the client console (SQL * plus) to receive messages from
the server side(ORACLE DBMS).To send messages from the PL/SQL block to SQL * plus
console, use the DBMS_OUTPUT.PUT_LINE function.
The standard SQL , the PL/SQL code are executed at server side, not at client side.To stop
receiving messages from sever , enter SET SERVEROUTPUT OFF.
In oracle , you can use the SQL * plus command SHOW ERRORS to help you diagnose
errors found in PL/SQL blocks.
Q: Write anonymous PL/SQL program to insert rows into VENDOR table and display
the message “New vendor added”.
Ans:
BEGIN
INSERT INTO vendor VALUES (106,’GITA’,’VJA’);
DBMS_OUTPUT.PUT_LINE(‘New vendor added’);
END;
/
Page 46
PL/SQL Basic data types
Data Type Description
CHAR character values of a fixed length
VARCHAR2 variable length character values
NUMBER numeric values
DATE Date values
%TYPE inherits the datatype from a variable that you declared previously or from an
attribute of a database table. Ex: price1 PRODUCT.price %TYPE ;
assigns price1 the same datatype as the price column in the PRODUCT table.
Q: Write anonymous PL/SQL program to display the number of products in price range
0 and 10, 11 and 60 ,61 and 110 etc..
Ans:
DECLARE
P1 NUMBER(3) := 0;
P2 NUMBER(3) := 10;
NUM NUMBER(2) := 0;
BEGIN
WHILE P2<5000 LOOP
SELECT COUNT(pno) INTO NUM FROM product WHERE price BETWEEN P1 AND P2;
DBMS_OUTPUT.PUT_LINE(‘There are ‘|| NUM|| ‘ products with price between ‘||P1|| ‘ and
‘||P2);
P1 := P2+1;
P2 := P2+50;
END LOOP;
END;
/
The PL/SQL block shown above has following characteristics.
1. Each statement inside the PL/SQL code must end with a semicolon
2. The PL/SQL block starts with the DECLARE section in which you declare the
variable names, the data types and an initial value(optional).
3. A WHILE loop is used.
4. Uses the string concatenation symbol.
5. SELECT statement uses the INTO keyword to assign output of the query to a PL/SQL
variable
The most useful feature of PL/SQL block is that they let you create code that can be named,
stored and executed either implicitly or explicitly by the DBMS.
What is Trigger ? Explain.
Ans: A trigger is a procedural sql code which is fired when a DML statements like Insert,
Delete, Update is executed on a database table.
The syntax to create a trigger in oracle is:
CREATE OR REPLACE TRIGGER trigger_name
[BEFORE / AFTER] [DELETE /INSERT/UPDATE OF column_name ] ON table_name
[FOR EACH ROW]
[DECLARE]
[variable_name data-type [:= initial_value]]
BEGIN
PL/SQL instructions;
……
Page 47
END;
A trigger definition contains the following parts:
1.The triggering timing: BEFORE or AFTER. This timing indicates at what time the
trigger should get fired. (before or after the triggering statement is completed.)
2.The triggering statement/event: The statement that causes the trigger to execute
(INSERT, UPDATE or DELETE)
The triggering level: There are two types of triggers: statement – level triggers and row –
level triggers
• Statement – level triggers: This type of trigger is executed once, before or after the
triggering statement is completed.
• Row – level triggers: requires the use of the FOR EACH ROW keywords. This type
of trigger is executed once for each row affected. ( if you update 10 rows, the trigger
executes 10 times.
2. Triggering Action: The PL/SQL code enclosed between BEGIN and END keywords.
You can use a trigger to update an attribute in a table other than the one being
modified.
CREATE OR REPLACE TRIGGER TLP
AFTER INSERT ON line
FOR EACH ROW
BEGIN
UPDATE product
SET qoh = qoh - :NEW.LINE_UNITS
WHERE product.pno = :NEW.pno;
END;
/
TLP is a row level trigger that executes after inserting a new LINE row and reduces quantity
on hand (in PRODUCT table) of recently sold product by the number of units sold.
CREATE OR REPLACE TRIGGER trigger_name =>creates a trigger with the given name
or overwrites an existing trigger with the same name.
OF column_name =>This clause is used with update triggers. This clause is used when you
want to trigger an event only when a specific column is updated
ON table_name=> the name of the table or view to which the trigger is associated.
Example of a statement level trigger that is executed after an update of the qoh, pmin
attribute for an existing row or after an insert of a new row in the product table.
CREATE or REPLACE TRIGGER TPR
AFTER INSERT OR UPDATE OF QOH,PMIN ON PRODUCT
BEGIN
UPDATE PRODUCT
SET REORDER =1
WHERE QOH <= PMIN;
END;
/
Q: When does a trigger fire?
Ans: A trigger is triggered automatically when an associated DML statement is executed.
• A trigger is invoked before or after a data row is inserted, updated or deleted.
• A trigger is associated with a database table.
• Each database table may have one or more triggers.
• A trigger is executed as part of the transaction that triggered it.
Page 48
Q: How to delete a trigger?
Ans: When you delete a table, all its trigger objects are deleted with it.
If you want to delete a trigger without deleting the table, give the following command
DROP TRIGGER triggername.
Q: Write a program to update the customer balance in the CUSTOMER table after
inserting every new LINE row.
CREATE OR REPLACE TRIGGER TLC
AFTER INSERT ON line
FOR EACH ROW
DECLARE
cus CHAR(5);
tot NUMBER := 0; --to compute total cost
BEGIN
SELECT cno INTO cus FROM invoice --1) get the customer code
WHERE invoice.invno = :NEW.line_units;
tot := :NEW.line_price * :NEW.invno; --2)compute the total of the current line
UPDATE customer SET baldue = baldue + tot WHERE cno = cus;
DBMS_OUTPUT.PUT_LINE(‘ *** Balance updated for customer : ‘ || cus);
END;
/
The trigger is a row level trigger that executes for each new LINE row inserted.
The SELECT statement returns only one attribute (cno) from INVOICE table and that
attribute returns only one value.
You use the INTO clause to assign a value from a SELECT statement to a variable (cus) used
within a trigger.
Double dashes “--“ are used to indicate comments within the PL/SQL block.
Trigger action based on conditional DML predicates
You can create a trigger that executes after an insert, an update or a delete on the PRODUCT
table and to know which one of the three statements caused the trigger to execute use the
following syntax:
IF INSERTING THEN ……END IF;
IF UPDATING THEN ……END IF;
IF DELETING THEN……END IF;
Triggers can be used to
• To enforce constraints that cannot be enforced at the DBMS design and
implementation levels.
• To facilitate enforcement of referential integrity.
• Update table values, insert records in tables and call other stored procedures.
• Triggers add functionality by automating critical actions and providing appropriate
warnings and suggestions.
• Triggers add processing power to RDBMS and to database system as a whole.
Oracle recommends triggers for
• Auditing purposes (creating audit logs)
• Automating generation of derived column values.
• Enforcement of business or security constraints.
Page 49
• Creation of replica tables for back up purposes.
Q:What are the various type of triggers?
Statement – level triggers: This type of trigger is executed once, before or after the
triggering statement is completed.
Example of a statement level trigger that is executed after an update of the qoh, pmin
attribute for an existing row or after an insert of a new row in the product table.
CREATE or REPLACE TRIGGER TPR
AFTER INSERT OR UPDATE OF QOH,PMIN ON PRODUCT
BEGIN
UPDATE PRODUCT
SET REORDER =1
WHERE QOH <= PMIN;
END;
/
Row – level triggers: requires the use of the FOR EACH ROW keywords. This type of
trigger is executed once for each row affected. ( if you update 10 rows, the trigger executes
10 times.
Example: CREATE or REPLACE TRIGGER TPR
BEFORE INSERT OR UPDATE OF QOH, PMIN ON PRODUCT
FOR EACH ROW
BEGIN
IF :NEW.QOH <= :NEW.PMIN THEN
:NEW.REORDER :=1;
ELSE
:NEW.REORDER :=0;
END IF;
END;
/
What are Stored Procedures? Explain?
A stored procedure is a named group of SQL statements that have been previously created
and stored in the server database.
Advantages:
• Stored procedures accept input parameters so that a single procedure can be
used over the network by several clients using different input data.
• Stored procedures reduce network traffic and improve performance.
• Stored procedures can be used to help ensure the integrity of the database.
• Stored procedures help reduce code duplication by means of code isolation and code
sharing, there by minimizing the chance of errors and the cost of application
development and maintenance.
• Stored procedures are useful to encapsulate shared code to represent business
transactions i.e, you need not know the name of newly added attribute and would need
to add new parameter to the procedure call.
Syntax to create procedure:
CREATE OR REPLACE PROCEDURE procedure_name [(argument [in/out] data-type,….)]
[IS / AS] [variable_name data-type [:= initial_value]]
BEGIN
PL/SQL or SQL statements;
…
END;
Page 50
Syntax to execute a stored procedure
EXEC procedure_name[(parameter_list)];
Ex: Write a stored procedure to assign an additional 5 % discount for all products when the
QOH = 2PMIN
CREATE OR REPLACE PROCEDURE prod_discount AS
BEGIN
UPDATE product
SET discount = discount + .05
WHERE qoh >= pmin * 2;
DBMS_OUTPUT.PUT_LINE(‘*** Update Finished ***’);
END;
/
1. argument specifies the parameters that are passed to the stored procedures. A stored
procedure could have zero or more arguments.
2. IN/OUT indicates whether the parameter is for input, output or both.
3. Variables can be declared between the keywords IS and BEGIN.
To make percentage increase an input variable in the above procedure---
CREATE OR REPLACE PROCEDURE prod_discount ( pd IN NUMBER)
AS BEGIN
IF ((pd <= 0) OR (pd >= 1)) THEN
DBMS_OUTPUT.PUT_LINE(‘Error value must be greater than 0 and less than 1’);
ELSE
UPDATE product
SET discount = discount + .05
WHERE qoh >= pmin * 2;
DBMS_OUTPUT.PUT_LINE(‘*** Update Finished ***’);
END IF;
END;
/
To execute the above procedure---
EXEC prod_discount(.05);
Q: write a stored procedure to add new customer.
CREATE OR REPLACE PROCEDURE cadd (w_cname IN VARCHAR2, w_city IN
VARCHAR2)
AS
BEGIN
INSERT INTO customer (cno, cname, city) values(CSEQ1.NEXTVAL, w_cname,
w_city);
DBMS_OUTPUT.PUT_LINE(‘Customer added ’);
END;
/
The procedure uses
• several parameters one for each required attribute in the CUSTOMER table.
• CSEQ1 sequence to generate a new customer code.
The parameters can be null only when the table specifications permit null for that parameter.
To execute:
EXEC cadd(‘KALA’, ‘VJA’,NULL);
Page 51
Q: Write procedures to add new invoice and line row.
Ans:
CREATE OR REPLACE PROCEDURE invadd(w_cno IN NUMBER, w_date IN DATE)
AS BEGIN
INSERT INTO invoice
VALUES(ISEQ.NEXTVAL, w_cno, w_date);
DBMS_OUTPUT.PUT_LINE(‘Invoice Added’);
END;
/
CREATE OR REPLACE PROCEDURE lineadd (ln IN CHAR, pn IN CHAR, lu IN
NUMBER)
AS
lp NUMBER := 0;
BEGIN
SELECT price INTO lp
FROM product
WHERE pno = pn ;
INSERT INTO line VALUES(ISEQ. CURRVAL, ln, pn, lu, lp);
DBMS_OUTPUT.PUT_LINE(‘Invoice Line Added’);
END;
/
Q: What is a cursor? How many types of cursors are there? How to handle cursors?
Ans:Cursor is reserved area in memory in which output of the query is stored,
like an array holding rows and columns.
There are two types of cursors: implicit and explicit.
An implicit cursor is automatically created in PL/SQL when the SQL statement returns only
one value.
An explicit cursor is created to hold the output of an SQL statement that may return two or
more rows.(but could return 0 or only one row)
To create an explicit cursor, use the following syntax inside PL/SQL DECLARE section.
CURSOR cursor_name IS select-query;
The cursor declaration section only reserves a named memory area for the cursor.
Once you declared a cursor, you can use cursor processing commands anywhere between the
BEGIN and END keywords of the PL/SQL block.
Cursor Processing Commands
Cursor Command Explanation
OPEN Executes the SQL command and populates the cursor with data
Before you can use a cursor, you need to open it Ex: OPEN
cursor_name.
FETCH To retrieve data from the cursor and copy it to the PL/SQL variables.
The syntax is : FETCH cursor_name INTO variable1 [,variable2,
…..]
CLOSE The CLOSE command closes the cursor for processing
Cursor style processing involves retrieving data from the cursor one row at a time.
The set of rows the cursor holds is called the active set.
The data set contains a current row pointer.
Therefore after opening a cursor, the current row is the first row of the cursor.
Page 52
When you fetch a row from the cursor, the data from the current row in the cursor is copied to
the pl/sql variables. After the fetch, the current row pointer moves to the next row in the set
and continues until it reaches the end of the cursor.
Cursor Attributes determine when you reached the end of the cursor data set, number of
rows in cursor etc…
Attribute Description
%ROWCOUNT Returns the number of rows fetched so far.
If the cursor is not OPEN, it returns an ERROR.
If no fetch has been done but the cursor is OPEN, it returns 0.
%FOUND Returns TRUE if the last FETCH returned a row and FALSE if not.
If the cursor is not OPEN, it returns an ERROR.
If no fetch has been done, it contains NULL.
%NOTFOUND Returns TRUE if the last FETCH did not return any row and FALSE if it
did.
If the cursor is not OPEN, it returns an ERROR.
If no fetch has been done, it contains NULL.
%ISOPEN Returns TRUE if the cursor is OPEN or FALSE if the cursor is CLOSED.
CREATE OR REPLACE PROCEDURE pce IS
p product.pno%TYPE;
desc product.pdesc%TYPE;
tot NUMBER(3);
CURSOR pc IS
SELECT pno, pdesc FROM product
WHERE qoh > (SELECT AVG(qoh) FROM product);
BEGIN
DBMS_OUTPUT.PUT_LINE(‘PRODUCTS WITH QOH > AVG(QOH)’);
OPEN pc;
LOOP
FETCH pc INTO p,desc;
EXIT WHEN pc%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(p||’ => ‘||desc);
END LOOP;
DBMS_OUTPUT.PUT_LINE(‘TOTAL PRODUCT PROCESSED ‘||pc%ROWCOUNT);
CLOSE pc;
END;
/
Page 53
Unit-III Chapter –III Database Design
Q: What is an information system?
Ans: A complete information system is composed of people, hardware, software, the
databases, application programs and procedures.
The process of creating an information system is known as system development.
Q: The system development life cycle(SDLC)
The SDLC is an iterative rather than a sequential process.
1. Planning:
The SDLC planning phase yields a general overview of the company and its objectives.
An initial assessment of the information flow-and-extent requirements must be made to
answer questions like
Should the existing system be continued?
Should the existing system be modified?
Should the existing system be replaced?
If it is decided that a new system is necessary, then it is checked whether the new system is
feasible or not. The feasibility study includes
1. Technical feasibility: Can the development of the new system be done with current
equipment, existing software technology, and available personnel? Does it require
new technology?
2. Economic feasibility: Can we afford it? Is it a million dollar solution for a thousand
dollar problem?
3. Operational feasibility: Does the company possess the human, technical and
financial resources to keep the system operational? Will there be resistance from
users?
2.Analysis:
A thorough audit of user requirements and
understanding of system’s functional areas, actual and potential problems and opportunities.
The logical design must specify the appropriate conceptual data model, inputs, processes and
expected output requirements using tools such as DFDs,ER diagrams etc..
All data transformations (processes) are described and documented using such system
analysis tools.
3.Detailed System design
The design includes all the necessary technical specifications for the screens, menus, reports
and other devices that might be used to help make the system more efficient information
generator.
4. Implementation
The hardware, DBMS software and application programs are installed and the
database design is implemented.
During the intial stages of implementation phase, the system enters into a cycle of coding,
testing and debugging until it is ready to be delivered.
The system will be in full operation by the end of this phase but will be continuously
evaluated and fine-tuned.
5. Maintenance
Maintenance includes all the activity after the installation of software that is performed to
keep the system operational.
Major forms of maintenance activities are
fixing of errors fall under corrective maintenance.
Adaptive maintenance due to changes in the business environment.
Perfective maintenance to enhance the system.
Page 54
Page 55
Page 56
Page 57
UNIT-IV Chapter-I
Transaction management and concurrency control
What is a transaction?
A transaction is any action that reads from and / or writes to a database.
A transaction is a single, indivisible, logical unit of work.
All transaction are controlled and executed by the DBMS to guarantee database integrity.
Q: Transaction properties or ACID test
Each individual transaction must display Atomicity, Consistency, Isolation and Durability.
Atomicity: requires all operations (SQL requests) of a transaction be completed if not the
transaction is aborted.
If a transaction T1 has four SQL requests, all four requests must be successfully completed
otherwise the entire transaction is aborted.
Consistency: A transaction takes a database from one consistent state to another.
Isolation: means that the data used during the execution of a transaction cannot be used by a
second transaction until the first one is completed.
If transaction T1 is being executed and is using data item X, that data item cannot be accessed
by any other transction until T1 ends.
Durability:ensures that once transaction changes are done(committed), they cannot be
undone or lost, even in the event of a system failure. COMMITED TRANSACTIONS ARE
NOT ROLLED BACK
When executing multiple transactions the DBMS must schedule the concurrent execution of
the transaction’s operations. The schedule of such transaction’s operations must exhibit the
property of serializability.
Serializability ensures that the schedule for the concurrent execution of the transactions
yields consistent results.
Q: Transaction management with SQL
Transaction support is provided by two SQL statements : COMMIT and ROLLBACK.
Transaction sequence must continue through all succeeding SQL statements until one of the
following four events occurs.
• A COMMIT statement is reached, in which case all changes are permanently
recorded within the database.
The COMMIT statement automatically ends the SQL transaction.
Ex: UPDATE product SET qoh = qoh-2 WHERE pno=’P01’;
UPDATE customer SET baldue = baldue+20 where cno = 201;
COMMIT;
• A ROLLBACK is reached, in which case all changes are aborted and the database is
rolled back to its previous consistent state.
• The end of a program is successfully reached, in which case all changes are
permanently recorded within the database. This action is equivalent to COMMIT
• A program is abnormally terminated, in which case changes made in the database
are aborted and the database is rolled back to its previous consistent state. This action
is equivalent to ROLLBACK.
A transaction begins implicitly when the first SQL statement is encountered.
SQL SERVER uses transaction management statement such as BEGIN TRANSACTION; to
indicate beginning of a new transaction.
Page 58
The Oracle RDBMS uses the SET TRANSACTION statement to declare a new transaction
start and its properties.
Q: The Transaction Log:
A DBMS uses a transactional log to keep track of all transactions that update the database.
The information stored in this log is used by the DBMS for a recovery requirement
The transaction log stores:
• A record for the beginning of the transaction.
• For each transaction component (SQL statement ):
The type of operation being performed (update, insert, delete)
The names of the objects affected by the transaction ( the name of the table)
The before and after values for the fields being updated.
Pointer to the previous and next transaction log entries for the same
transaction.
• The ending (COMMIT) of the transaction.
A transaction log
TRL_ID TRX_NUM Prev
PTR
Next
PTR
Operation Table RowID Attribute Befor
e
Value
After
Value
341 101 Null 352 START **Start
Transaction
352 101 341 363 UPDATE PRODUCT P01 qoh 20 18
363 101 352 365 UPDATE CUSTOMER 201 baldue 100 120
365 101 363 Null COMMIT **End of
Transaction
Concurrency Control
The Coordination of the simultaneous execution of transactions in a multi user database
system is known as concurrency control.
Problems due to concurrent transaction due to lack of concurrency control are:
1. Lost updates
2. Uncommited data and
3. Inconsistent retrievals
Lost updates occurs when two concurrent transactions, T1 and T2 are updating the same data
element and one of the upadates is lost (overwritten by the other transaction)
Ex: consider two concurrent transactions to update qoh
Transaction Computation
T1 :Purchase of 100 units qoh = qoh + 100
T2: Sell 30 units Qoh = qoh - 30
Serial execution of above transactions
Time Transaction Step Value
1 T1 Read qoh 35
2 T1 qoh = 35+ 100
3 T1 Write qoh 135
4 T2 Read qoh 135
5 T2 qoh = 135-30
6 T2 Write qoh 105
Page 59
The below table shows how a Lost Updates problem can arise
Time Transaction Step Value
1 T1 Read qoh 35
2 T2 Read qoh 35
3 T1 qoh = 35+ 100
4 T2 qoh = 35-30
5 T1 Write qoh (lost
update)
135
6 T2 Write qoh 5
Uncommited data or dirty read occurs when two transactions T1 and T2 are executed
concurrently and the first transaction (T1) is rolled back after the first transaction (T1) is
rolled back after the second transaction (T2) has already accessed the uncommitted data -
thus violating the isolation property of transactions.
Time Transaction Step Value
1 T1 Read qoh 35
2 T1 qoh = 35+ 100
3 T1 Write qoh 135
4 T1 ***ROLL BACK** 35
5 T2 Read qoh 35
6 T2 qoh = 135-30
7 T2 Write qoh 5
Time Transaction Step Value
1 T1 Read qoh 35
2 T1 qoh = 35+ 100
3 T1 Write qoh 135
4 T2 Read qoh 135
5 T2 qoh = 135-30
6 T1 ***ROLLBACK*** 35
7 T2 Write qoh 105
Inconsistent retrievals occur when a transaction accesses data before and after another
transaction finish working with such data.
For ex: T1 calculates summary( aggregate) function over a set of data while another
transaction T2 is updating same data
Transaction T1 Transaction T2
SELECT SUM(qoh) FROM product WHERE
pno < ‘P04’;
UPDATE product SET qoh = qoh + 10 WHERE
pno = ‘P01’;
Time Transaction Action Value Total
1 T1 Read qoh for pno = ‘P01’ 10 10
2 T2 Read qoh for pno = ‘P01’ 10
3 T2 qoh = 10 + 10
4 T2 Write qoh for pno = ‘P01’ 20
4 T1 Read qoh for pno = ‘P02’ 3 13
5 T2 ***COMMIT***
6 T2 Read qoh for pno = ‘P03’ 5000 5013
The computed answer of 5013 is wrong as the correct answer is 5023
Page 60
Unless the DBMS exercises concurrency control, a multi user database environment can
create havoc within the information system.
The Scheduler
If two transactions access unrelated data then
There will be no conflict
And order of execution is irrelevant.
If the transactions operate on related data then
Conflict possible and
The selection of one execution order over another might have undesirable
consequences.
The correct order is determined by built - in scheduler.
Q: What is a scheduler?
The scheduler is a special DBMS process that establishes the order in which the operations
within concurrent transactions are executed.
The scheduler bases its actions on concurrency control algorithms, such as locking or time
stamping methods.
NOT ALL TRANSACTIONS ARE SERIALIZALE
If the transactions are not serializable then the transaction are executed on first-come, first -
served basis by the DBMS.
If transactions are executed in serial order one after another then
CPU time will be wasted and
Yields unacceptable response times within the multi user DBMS environment.
Q: What is Serializable Schedule?
Ans: A serializable schedule is a schedule of transaction’s operations in which the interleaved
execution of the transactions (T1, T2, T3 etc.) yields the same result as if the transaction were
executed in serial order (one after another).
Q: What are conflicting database operations?
Ans: Scheduler facilitates data isolation to ensure two transactions do not update the
same data element at the same time.
Two operations are in conflict when they access the same data and atleast one of them is a
WRITE operation.
The figure shows Conflicting database operations
Concurrency control with locking methods
A lock guarantees exclusive use of a data item to a current transaction.
A transaction acquires lock prior to data access; the lock is released when the transaction is
completed so that another transaction can lock the data item for its exclusive use.
The database might be in a temporary inconsistent state when several updates are executed.
Therefore, locks are required to prevent another transaction from reading inconsistent data.
Most multi user DBMSs automatically initiate and enforce locking procedures.
All lock information is managed by a lock manager.
Q: What is lock granularity? Explain?
Page 61
Transaction
Operations
T1 T2 Result
Read Read No Conflict
Read Write Conflict
Write Read Conflict
Write Write Conflict
Ans: Lock granularity indicates the level of lock use.
Locking can take place at the following levels: database, table, page, row or even field.
Database level
In a database level lock, the entire database is locked.
This level of locking is good for batch processes, but it is unsuitable for multiuser DBMSs.
Table Level
In a table level lock, the entire table is locked.
It is unsuitable for multiuser DBMSs.
Page Level
In page level lock, the DBMS will lock an entire diskpage.
Page is equivalent to disk block. A block is the smallest unit of data transfer between the hard
disk and the processor.
Row level
The DBMS allows concurrent transactions to access different rows of the same table.
The row - level locking approach
Improves the availability of data but
Its management requires high overhead because a lock exists for each row.
Modern DBMS automatically escalate a lock from row level to page level lock when the
application session requests multiple locks on the same page.
Field Level
The DBMS allows concurrent transactions to access different fields within a row.
Field level locking
Yields the most flexible multi user data access but it is
Rarely implemented in DBMS
LOCK TYPES : DBMS may use different lock types: Binary locks and Shared/Exclusive
locks
Binary Locks:
A binary lock has only two states: locked (1) or unlocked (0).
If an object is locked by transaction, no other transaction can use that object.
Shared/Exclusive Locks :A shared lock is issued when a transaction wants to read data
from the database.
An exclusive lock is issued when a transaction want to update (write) a data item.
Using Shared/Exclusive Locks concept, a lock can have three states: unlocked, shared
(read) and exclusive (write).
Lock-compatibility matrix
• Any number of transactions can hold shared locks on an item,
but if any transaction holds an exclusive on the item no other transaction may
hold any lock on the item.
• If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is then
granted.
Although the use of shared locks renders data access more efficiently, a Shared/Exclusive
Lock schema increases the lock manager’s overhead for several reasons:
• The type of lock must be known before a lock can be granted
• Three lock operations exist:
Page 62
READ_LOCK (to check the type of lock)
WRITE_LOCK (to issue the lock) and
UNLOCK (to release the lock)
• The schema has been enhanced to allow a lock upgrade (from shared to exclusive)
and a lock downgrade (from exclusive to shared)
Locks prevent serious data inconsistencies but they lead to two major problems:
• Serializability
• Deadlock
Serializability is guaranteed through a locking protocol known as two-phase locking.
TWO-PHASE LOCKING (2PL):defines how transaction acquire and release locks
• ENSURES SERIALIZABILITY
• Does not prevent deadlocks
The two phases are:
Growing Phase:
• Acquires all locks
• Doesnot unlock any data
Once all locks have been acquired, the transaction is in its locked state.
Shrinking Phase:
• Releases all locks
• Cannot obtain any lock
The two phase locking protocol is governed by the following rules
1. Two transaction cannot have conflicting locks
2. No unlock operation can precede a lock operation in the same transaction
3. No data are unaffected until all locks are obtained-- that is until the transaction
is in its locked point.
The transaction acquires all of the locks it needs until it reaches its locked point.
When the locked point is reached, the data are modified.
Finally the transaction is completed as it releases all of the locks it acquired in the first phase.
DeadLocks
A deadlock occurs when two transactions wait indefinitely for each other to unlock data.
For example: T1 has locked data item X and waiting for data item Y which is held by T2 and
T2 has locked data item Y and waiting for data item X which is held by T1.
T1 and T2 wait for each other to unlock the required data item. Such a deadlock is also
known as deadly embrace.
The figure demonstrates how a deadlock is created.
Page 63
Time Transaction Reply Lock status
0 Data X Data Y
1 T1: LOCK(X) OK Unlocked Unlocked
2 T2:LOCK(Y) OK Locked Unlocked
3 T1:LOCK(Y) WAIT Locked Locked
4 T2:LOCK(X) WAIT Locked Locked
3 T1:LOCK(Y) WAIT Locked Locked
4 T2:LOCK(X) WAIT Locked Locked
Deadlock
The three basic techniques to control deadlocks are:
Deadlock prevention:
A transaction requesting a new lock is aborted when there is the possibility that a
deadlock can occur.
If a transaction is aborted, all changes made by this transaction are rolled back and all locks
obtained are released.
The transaction is rescheduled for execution.
Deadlock prevention works because it avoids the conditions that lead to deadlocking.
Page 64
Deadlock detection:
The DBMS periodically tests the database for deadlocks. If a deadlock is found, one of
the transaction (“the victim”) is aborted (rolled back and restarted ) and the other
transaction continues.
Deadlock avoidance: The transaction must obtain all locks it needs before it can be
executed. This technique avoids rollback of conflicting transactions.
The choice of deadlock control method to use depends on the database environment.
For example, if the probability of deadlock is
Low, deadlock detection is recommended.
High, deadlock prevention is recommended.
If the response time is not high on the system’s priority list, deadlock avoidance is employed.
DBMS use a blend of prevention and avoidance for other types of data such as XML data or
data warehouses
All current DBMSs support deadlock detection in transactional databases
Concurrency control with time stamping methods
The time stamping approach assigns a global, unique time stamp to each transaction.
Time stamps must have two properties: uniqueness and monotonicity.
Uniqueness ensures that no equal timestamp values can exists
Monotonicity ensures that timestamp values always increase.
All database operations (READ and WRITE) within the same transaction must have the same
timestamp.
The DBMS executes conflicting operations in timestamp order.
If two transactions conflict, one is stopped, rolled back, rescheduled and assigned a new
timestamp value.
For each data Q two timestamp values have to be maintained. They are
W-timestamp(Q) is the largest timestamp of any transaction that executed write(Q)
successfully.
R- timestamp(Q) is the largest timestamp of any transaction that executed read(Q)
successfully.
Thus timestamping increases memory needs and database processing overhead and uses a lot
of system resources.
WAIT / DIE AND WOUND/WAIT SCHEMES
Assume that you have two conflicting transactions T1 and T2 each with a unique timestamp.
Suppose T1 has a timestamp of 115 and
T2 has a timestamp of 195
That is T1 is older transaction and T2 is newer transaction.
Transaction
Requesting
Lock
Transaction
Owning
lock
Wait/ Die Scheme Wound/Wait Scheme
T1(115) T2(195) T1 waits until T2 is
completed
and T2 releases its lock
T1 preempts (rollbacks) T2
T2 is rescheduled using the same
timestamp
T2(195) T1(115) T2 dies (rollback)
T2 is rescheduled using
the same timestamp
T2 waits until T1 is completed and T1
releases its lock
For A transaction that requests multiple locks.How long does a transaction have to wait for
each lock request?
To prevent that type of deadlock, each lock request has an associate timeout value.
If the lock is not granted before the timeout expires, the transaction is rolled back.
Page 65
Concurrency control with optimistic methods
The optimistic approach is based on the assumption that the majority of the database
operations do not conflict.
This has three phases they are:
Read phase: the transaction T reads the data items from the database into its private
workspace.
All the updates of the transaction can only change the local copies of the data in a private
workspace.
Validate phase:
Checking is performed to confirm the read values have changed during the time transaction
was updating the local values. This is performed by comparing the current database values to
the values that were read in the private workspace.
Incase the values have changed, the local copies are thrown away and the transaction aborts.
Write phase: The changes are permanently applied to the database.
Database recovery management
Transaction recovery reverses all the changes that the transaction made to the database before
the transaction was aborted and to recover the system after some type of critical error has
occurred.
Examples of critical events are:
1. Hardware/software failures: Failure of this type could be harddisk media failure, bad
capacitor on motherboard, or a failing memory bank, application program or O.S errors that
cause the data to be overwritten, deleted or lost.
2. Human- caused incidents are two types: unintentional and intentional
• Unintentional failure caused by carelessness by end users. Such errors include
deleting the wrong rows from a table or shutting down the database server by accident.
• Intentional events are security threats caused by hackers and virus attacks.
3. Natural disasters include fires,earthquakes,floods and power failures.
A critical error can render the database in an inconsistent state.
Transaction recovery
Database transaction recovery uses data in transaction log to bring a database to a
consistent state after a failure.
Four important concepts that affect the recovery process.
• The write-ahead-log protocol ensures that transaction logs are always written
before any database data are actually updated. Recovery is done using the data in
transaction log incase of failure.
• Redundant transaction logs ensure that a physical disk failure will not affect the
recovery.
• When a transaction updates data, it actually updates the copy of the data in
buffer(temporary storage area in primary memory). This process is much faster than accessing
the physical disk every time. Later on, all buffers are written to physical disk during a
single operation.
• Database checkpoints are operations in which the DBMS writes all of its updated
buffers to disk.
While this is happening, the DBMS does not execute any other requests.
Checkpoints are automatically scheduled by the DBMS several times per hour.
Checkpoint operation is also registered in the transaction log.
Page 66
Q: Write about different types of recovery techniques?
Ans: Transaction recovery procedures generally make use of
Deferred-write also called Deferred Update and
Write-through also called Immediate Update techniques.
In Deferred- write technique, the transaction operations do not immediately update the
physical database. Instead, only the transaction log is updated.
The database is physically updated only after the transaction is committed
The recovery process using Deferred-write follows these steps:
1. Identify the last checkpoint in the transaction log.
(This is the last time transaction data was physically saved to disk.)
2. For a transaction that started and committed before the last checkpoint, nothing
needs to be done
3. For a transaction that performed a commit operation after last checkpoint, the
DBMS redoes the transaction using the after values in transaction log.
The changes are made in ascending order from oldest to newest.
4. For a transaction that had a ROLLBACK operation after the last checkpoint or that
was left active (with neither a COMMIT nor a ROLLBACK) before failure , nothing is done
because no changes were written to disk
In Write-through technique, the database is immediately updated by transaction operations
during the transaction’s execution, even before the transaction reaches its commit point.
If a transaction aborts, a undo operation need to be done.
The recovery process using write-through follows these steps:
1. first 3 points same as above.
2. For a transaction that had a ROLLBACK operation after the last checkpoint or that
was left active (with neither a COMMIT nor a ROLLBACK) before failure, the DBMS undo
the transaction using the before values in transaction log.
TRL
ID
TRX
NUM
Prev
PTR
Next
PTR
Operation Table ID Row
Value
Attribute Befor
e
After Value
341 101 Null 352 START **Start
Transaction
352 101 341 363 UPDATE PRODUCT P01 QOH 20 18
363 101 352 365 UPDATE CUSTOMER 201 baldue 100 120
365 101 363 Null COMMIT **End of
transaction
397 106 Null 405 START **Start
Transaction
405 106 397 415 INSERT INVOICE 305 305
415 106 405 419 INSERT LINE 305,L01 305,L01,P05
419 106 415 427 UPDATE PRODUCT P05 QOH 120 119
423 CHECK
POINT
427 106 419 431 UPDATE CUSTOMER 202 baldue 500 1050
431 106 427 Null COMMIT **End of
transaction
521 155 Null 525 START **Start
Transaction
525 155 521 528 UPDATE PRODUCT P08 QOH 100 80
528 155 525 Null COMMIT **End of
transaction
***C*R*A*S*H***
Page 67
Transaction 101 :
UPDATE product SET qoh=qoh-2 WHERE pno=’P01’;
UPDATE customer SET baldue =baldue+20 WHERE cno=201;
COMMIT;
Transaction 106:
INSERT INTO Invoice VALUES(305,202,SYSDATE());
INSERT INTO line VALUES(305,’L01’,’P05’,1,550);
UPDATE product SET qoh=qoh-1 WHERE pno=’P05’;
UPDATE customer SET baldue =baldue+550 WHERE cno=202;
COMMIT;
Transaction 155:
UPDATE product SET qoh=qoh-20 WHERE pno=’P08’;
COMMIT;
Database recovery process for a DBMS using deferred update method is as follows
1. Identify the last checkpoint. In this case it is TRL ID 423. This was the last time
database buffers were physically written to disk.
2. Transaction 101 committed before the last check point. All changes were already
written to disk and so no action to be taken
3. For each transaction committed after the last checkpoint, the DBMS does
for example: for transaction 106:
1.Find COMMIT (TRL ID 457)
2.Use the previous pointer values to locate the start of the transaction (TRL ID 397)
3.Use the next pointer values to locate each DML statement and apply the changes to
disk, using the
after values (Start with TRL ID 405, then 415,419,427 and 431)
4.Repeat the process for transaction 155
4. Transactions that ended with ROLLBACK and that were active at the time of crash,
nothing is done because no changes were written to disk
Page 68
Unit –IV Chapter-II
Distributed Database Management Systems
Q: What are the disadvantages of centralized database management system?
• Performance degradation due to growing number of remote locations over greater
distances.
• High costs associated with maintaining and operating large central (mainframe) database
systems
• Reliability problems created by dependence on a central site(single point of failure
syndrome) and the need for data replication.
• Scalability problems associated with the physical limits imposed by a single
location(temperature conditioning and power consumption)
• Organizational rigidity imposed by the database might not support the flexibility and
agility required by modern global organizations.
Q: What is Distributed database
Ans A distributed database management system ( DDBMS ) governs the storage and
processing of logically related data over interconnected computer systems in which both data
and processing functions are distributed among several sites.
Q: What is DDBMS
Ans: The software system that permits the management of the distributed database and makes
the distribution transparent to users.
Q: What are the advantages and disadvantages of DDBMS
Ans: Advantages
1. Data are located near the greatest demand site.
2. Faster data access as end users work with only a locally stored subset of the company data.
3. Faster data processing as the workload is distributed at several sites.
4. Growth facilitation as new sites can be added without affecting other sites
5. Improved communication as local sites foster better communication between customer and
staff
6. Reduced operating costs as development work is done more cheaply and more quickly on
low-cost PCs than on mainframes
7. User friendly interface as the GUI simplifies training and use for end users
8. Less danger of single point failure: when one computer fails, the workload is picked up by
other workstations as data are distributed at multiple sites.
9. Processor independence : The end user is able to access any available copy of the data, and
an end users request is processed by any processor at the data location
Disadvantages
1. Complexity of management and control: Applications must recognize data location and
they must be able to stitch together data from various sites.
2. Technological difficuilty: Data integrity, transaction management, concurrency control,
security, back up recovery, query optimization, access path selection etc… must all be
addressed and resolved
3. Security : The probability of security lapses increases when data are located at multiple
sites. The responsibility of data management will be shared by different people at several sites.
4. Lack of standards: There are no standard communication protocols at the database level.
For example, different database vendors employ different often incompatible techniques to
manage the distribution of data and processing in a DDBMS environment.
Page 69
5. Increased storage and infrastructure requirements: multiple copies of data are required
at different sites, thus requiring additional disk storage space.
6. Increased training cost
7. Costs: Distributed databases require duplicated infrastructure to operate (physical location,
environment, personnel, software, licensing etc)
Distributed processing system in centralized database
Although the database resides at only one site, each site can access the data and update the
database. That is, shares the database processing chores among several sites.
These sites are connected through a communication network. Refer book for diagram
Distributed database requires
distributed processing.
Distributed processing may be based on centralized
database or distributed database
Both distributed processing and databases require a network to connect all components
Distribution database system
In a distribution database system, a database is composed of several parts known as database
fragments.
The database fragments are located at different sites and can be replicated among various sites.
Distributed database requires distributed processing. Refer book for diagram
Each database fragment is managed by local database process. (Distributed processing)
For the management of distributed data to occur, copies or parts of database processing
functions must be distributed to all data storage sites.
Characteristics of DDBMS
The DBMS must have the following functions to be classified as distributed:
• Application interface to interact with the end user, application programs and other DBMSs
within the DDB.
• Validation to analyze data requests for syntax correctness.
• Transformation to decompose complex requests into atomic data request components
• Query optimization to find the best access strategy
• Mapping to determine the data location of local and remote fragments.
• I/O interface to read or write data from or to permanent local storage.
• Formatting to prepare the data for presentation to the end user or to an application program
• Security to provide data privacy at both local and remote databases.
• Back up and recovery to ensure the availability and recoverability of the database in case
of a failure.
• DB administration features for the database administrator
• Concurrency control to manage simultaneous data access and to ensure data consistency
across data fragments in the DDBMS.
• Transaction management to ensure that the data moves from one consistent state to
another. This activity includes the synchronization of local and remote transactions as well as
transactions across multiple distributed segments.
Centralised DBMS functions
• Receiving an application on end user’s request
• Validate, analyse and decompose the request. The request might include mathematical
and/or logical operations.
Page 70
• Ex: SELECT all customers with a balance > $1000.
The request might require data from only a single table or it might require access to several
tables.
• Map the request’s logical -to -physical data components.
• Decompose the request into several disk I/O operations
• Search for, locate, read and validate the data.
• Ensure database consistency, security and integrity.
• Validate the data for the conditions, if any, specified by the request.
• Present the selected data in the required format
DDBMS components
1. Computers workstations (sites or nodes). The DDBMS must be independent of the
computer system hardware
2. Network hardware and software components that reside in each workstation to allow all
sites to interact and exchange data. Because the components-computers, O.S, network
hardware etc-- are likely to be supplied by different vendors, it is best to ensure that DDB
functions can be run on multiple platforms.
3. Communication media that carry the data from one workstation to another.
The DDBMS must be communications media-independent
4. The transaction processor (TP) also known as application processor(AP) or the
transaction manager (TM), which is the software components found in each computer that
requests data.The transaction processor receives and processes the application’s data
requests (remote and local).
5. The data processor (DP) also known as the data manager(DM), which is the software
component residing on each computer that stores and retrieves data located at the site.
A DP may even be a centralized DBMS.
The communication between TPs
and DPs is possible through
protocols used by the DDBMS
The protocol determines how the
DDB system will
1.Interface with the network to
transfer data and commands
between DPs and TPs.
2.Synchronize all data received
from DPs(TP side)
3.and route retrieved data to
appropriate TPs(DP side)
4. Ensure database functions like
security, concurrency control,
back up and recovery in DDB
DPs and TPs can be added to the
system without affecting the
operations of the other
components.
DPs and TPs can reside on the
same computer.
Page 71
Levels of data and process distribution
Current database systems can be classified on the basis of how process distribution and data
distribution are supported.
For ex: a DBMS may store data at a single site or in multiple sites and
may support data processing at a single site or at multiple sites.
Single-Site Processing, Single-Site Data (SPSD)(Centralized)
- all processing is done on a single host computer
- all data are stored on the host computer’s local disk system.
Ex: mainframe systems, single processor server and multiple processor server systems)
The functions of TP and DP are embedded within the DBMS located on single computer.
The DBMS usually runs under a time sharing, multitasking O.S, which allows several
processes to run concurrently on a host computer
Multiple-Site Processing, Single Site Data (MPSD)- multiple processes run on different
computers sharing a single data repository
This scenario requires a network file server running applications that are accessed through a
network
1. The TP on each workstation routes all network data requests to the file server.
2. Only the data storage input/output (I/O) is handled by the file server, so offers limited
capability of distributed processing.
3. The end user must make direct reference to the fileserver to access remote data.
4. All record and file locking activities are done at the end-user location
5. All data selection, search and update functions take place at the workstation,
thus requiring that entire file travel through the network for processing at the workstation.
Such a requirement increases network traffic, slows response time and increase
communication costs.
For ex: File server computer stores a CUSTOMER table containing 10000 data rows, 50
of which have
balances >$1000. Suppose site A issues the query: SELECT * FROM customer WHERE
cus_balance>1000;
All 10000 CUSTOMER rows must travel through the network to be evaluated at site A
Client/Server architecture MPSD
All database processing is done at server site All database processing is done at client site
Thus reduces network traffic Thus increases network traffic
Capable of supporting data at multiple sites. Requires database to be located at a single site
Processing is distributed Processing is not distributed
Performs multiple site processing
Page 72
Multiple-Site Processing, Multiple-Site Data(MPMD) describes a fully DDB with support
for multiple data processors and transaction processors at multiple sites
Page 73
Types of DDBMS:
Depending on the level of support for various types of centralized DBMSs, DDBMSs are
classified as
homogenous DDBMS heterogenous DDBMS
Integrates only one type of
centralized DBMS over a network.
Integrates different types of centralized DBMS over a
network.
The same DBMS will be running
on different server platforms (single
processor server, multi-processor
server)
Fully heterogenous DDBMS will support different
DBMSs that even support different data models
(relational, hierarchical or network) running on different
computer systems such as mainframes and PCs
Some DDBMS implementations support several platforms, O.S and networks and allow access
remote data access to another DBMS, but subject to certain restrictions.
Remote access is provided on a read-only basis and does not support write privileges.
Restrictions are placed on the
• number of remote tables that may be accessed in a single transaction.
• number of distinct databases that may be accessed
• database model that may be accessed. Access may be provided to relational databases
but not to network or hierarchical databases.
Transparency features or functional characteristics
Allowing End user to feel like the database’s only user.
User believe that he is working with a centralized DBMS
All complexities are hidden or transparent to the user.
1. Distribution transparency: it makes dispersed database look like a single database to
the end user.
2. Transaction transparency: Allows a transaction to update data at more than one
network site.
Transaction transparency ensures that the transaction will be either entirely completed or
aborted, thus maintaining data integrity
3. Failure transparency ensures that the system will continue to operate in the event of a
node failure.
Functions that were lost because of failure will be picked up by another network node
4. Performance transparency :The system will not suffer any performance degradation
due to its use on a network or due to the network’s platform differences
Ensures that the system will find the most cost-effective path to access remote data.
5. Heterogeneity transparency :Allows integration of several different local DBMSs
(relational, hierarchical and network) under a common or global schema
The DBMS is responsible for translating the data requests from the global schema to the local
DBMS schema
Distribution transparency
Three levels of distribution transparency are
1. Fragmentation transparency: is the highest level of transparency.
Neither fragment names nor fragment locations are specified prior to data access.
2. Location transparency: end user must specify the database fragment names but not
their locations
3. Location mapping transparency: end user need to specify both the fragment names
and their locations
Page 74
Ex: The CUSTOMER table contains cno,cname,city attributes
The CUSTOMER data are distributed over 3 different locations: NewYork, Atlanta and
Miami
The table is divided by location i.e.., NewYork customers data are stored in fragment C1,
Atlanta customers data are stored in fragment C2
Miami customers data are stored in fragment C3
and each fragment is unique i.e.., it indicated each row is unique
No portion of the table is replicated at any other site.
Case 1: The database supports Fragmentation Transparency
SELECT * FROM customer; (no fragment names or location specified)
Case 2: The database supports Location Transparency
SELECT * FROM C1;
UNION
SELECT * FROM C2; (fragment names are specified and locations are not specified)
UNION
SELECT * FROM C3;
Case 3: The database supports Location Mapping Transparency
SELECT * FROM C1 NODE NY;
UNION
SELECT * FROM C2 NODE ATL; (fragment names are specified and
locations are not specified)
UNION
SELECT * FROM C3 NODE MIA;
Distribution transparency is supported by distributed data dictionary (DDD) or distributed data
catalogue (DDC). The DDC contains distributed global schema i.e.., entire database
description. The DDC is itself distributed and replicated. Therefore, it must maintain
consistency through updating all sites
Transaction transparency
Remote request/statements: lets a single SQL statement (or request) reference data at only
one remote site or D.P
A remote transaction contains one or more remote requests, all of which reference one
remote site or D.P
consider a transaction at site A
BEGIN WORK
UPDATE product SET qoh = qoh-1 WHERE pno = ‘P01’;
INSERT INTO invoice (invno,cno,invdate) VALUES (305, 202,SYSDATE);
COMMIT WORK;
Note the following remote transaction features:
• The transaction updates PRODUCT and INVOICE table (located at SiteC)
• The remote transaction is sent to and executed at the remote site C.
• The entire transaction can reference and be executed at only one remote DP
Page 75
A distributed transaction contains one or more requests
and
Each request can access only one remote site at a time
i.e..,
It allows a transaction to reference several different local
or remote DP sites .
Consider a transaction at site A
BEGIN WORK
UPDATE product SET qoh = qoh-1 WHERE pno = ‘P01’;
INSERT INTO invoice (invno,cno,invdate) VALUES (305, 202,SYSDATE);
UPDATE customer SET baldue = baldue + 10 WHERE cno = 202;
COMMIT WORK;
Note the following following features
1. The transaction references two remote sites (B and C)
2. The first two requests (UPDATE PRODUCT and INSERT INTO INVOICE) are
processed by the DP at remote site C, and the last request is processed by DP at the
remote site B.
3. Each request can access only one remote site at a time.
The third characteristics may create problems
If suppose the table PRODUCT is divided into PROD1 and PROD2, located at Site B and C
respectively, then a distributed transaction cannot execute the request- SELECT * FROM
product; because this request cannot access data from more than one remote site. So the
DBMS must support a distributed request.
A distributed request lets a one SQL statement reference data located at several different
local or remote DP sites.The ability to execute a distributed request provides fully distributed
database processing capabilities because of the ability to :
• Partition a database tables into several fragments
• Reference to one or more of those fragments with only one request.
Distributed Concurrency control
Multisite, multiple-process operations are much more likely to create data inconsistencies and
deadlocked transactions than are single-site systems
Q: Explain Two Phase Commit Protocol :
Distributed databases make it possible for a transaction to access data at several sites.
A final commit must not be issued until all sites have committed their parts of the transaction.
Each DP maintains its own transaction log.
The two phase commit protocol requires
DO-UNDO-REDO protocol
Write-Ahead Protocol
DO-UNDO-REDO protocol
DO performs the operation and records the before and after values in the transaction log.
UNDO reverses an operation, using the log entries written by the DO portion
REDO redoes an operation.
To ensure that the DO, UNDO, REDO operations can survive a system crash while they were
being executed, a write-ahead protocol is used.
The write-ahead protocol forces the log entries to be written to permanent storage before the
actual operation takes place.
There are two types of nodes: the coordinator node and subordinates or cohorts.
Coordinator role is assigned to a node that initiates the transaction
Page 76
The protocol is implemented in two phases
Phase-1 Preparation
The subordinate nodes receive the message, writes the transaction log, using the write-ahead
protocol and
Sends an acknowledgement (YES/PREPARED TO COMMIT) or (NO/NOT PREPARED)
message to coordinator.
If all nodes are PREPARED TO COMMIT , the transaction goes to phase -2
If one or more nodes reply NO or NOT PREPARED The coordinator broadcasts a ABORT
message to all subordinates
Phase -2 The Final Commit
The coordinator broadcasts a COMMIT message to all subordinates and wait for replies
Each subordinate receives the COMMIT message,
and then updates the database using the DO protocol
The subordinates reply with a COMMITTED or NOT COMMITTED message to the
coordinator.
If one or more subordinates did not COMMIT, it sends an ABORT message, thereby forcing
all subordinates to UNDO
Performance transaparency and query optimization
The DDBMS uses query optimization to decide which copy of the data to access.
The objective of query optimization routine is to minimize the total cost associated with the
execution of a request like
1. Access time (I/O) cost involved in accessing the physical data stored on disk
2. Communication cost associated with the transmission of data among nodes in DDB
systems
3. CPU time cost associated with the processing overhead of managing distributed
transactions
To evaluate query optimization, the TP must receive data from DP, synchronize it,
assemble the answer and present it to end user or an application
Most of the algorithms proposed for query optimization are based on two principles:
1. The selection of the optimum execution order.
2. The selection of sites to be accessed to minimize communication costs.
Query optimization algorithms can be evaluated on the basis of operation mode or the
timing of its optimization
Operation modes can be classified as manual or automatic--
Cost effective path is found and scheduled by end user or programmer in manual
Cost effective path is found and scheduled by DDBMS.
Page 77
Query optimization classification
On the basis of
operation modes
On the basis of type of
information used
On the basis of when
the optimization is
done
Manual Automatic Static
query
optimizatio
n
Dynamic
query
optimizatio
n
Statistically
based query
optimizatio
n
Rule- based
query
optimizatio
n
Classification according to when the optimization is done
1. Static query optimization takes place at compilation time. When the program is submitted
to the DBMS for compilation, it creates the plan to access the database.
2. Dynamic query optimization takes place at execution time. Its cost is measured by run-
time processing overhead
Classification according to type of information that is used to optimize the query
1. Statistically based query optimization algorithm uses statistical information like size,
number of records, average access time of database.
The statistical information is managed by DDBMS and is generated in dynamic mode or in
manual mode.
In dynamic statistical generation mode, the DDBMS automatically evaluates and updates the
statistics after each access.
2. Rule- based query optimization algorithm is based on a set of user-defined rules to
determine the best access strategy. The rules are entered by the end user or database
administrator.
Distributed database design
The design of a distribution database design introduces three new issues.
1. How to partition the database into fragments.
2. Which fragment to replicate.
3. Where to locate those fragments and replicas
Data fragmentation :
it allows you to break a single object (database,table etc..) into two or more segments or
fragments.
Each fragment can be stored at any site
(Information about data fragmentation is stored in distributed data catalog (DDC), from which
it is accessed by TP to process user requests.
Fragmented tables can be recreated from its fragments by using Joins and Unions.)
Horizontal fragmentation:
It refers to the division of a table into subsets (fragments) of rows.
Each fragments is stored at a different node, and
each fragment has unique rows.
(Each fragment represents the equivalent of a SELECT statement, with the WHERE clause on
a single attribute.
Ex:
cno cname cstate climit baldue
1 ANU TN 3500 2700
2 RAMA AP 6000 1200
3 RADHA TN 4000 3500
4 GOPI AP 1200 550
Suppose XYZ company requires information about its customers in the 2 sites (AP, TN)
And each state requires data regarding local customers only
So, distribute data by state ie., define
horizontal fragmentation by state
Fragment Name Location Condition Node Name Customer Numbers No. of nodes
C1 TN cstate = TN CHE 1,3 2
C2 AP cstate=AP VJA 2,4 2
Page 78
Table fragments in 2 states
Table name: C1 location : Tamil Nadu Node:CHE
cno cname cstate climit baldue
1 ANU TN 3500 2700
3 RADHA TN 4000 3500
Table name: C2 location : AndhraPradesh Node:VJA
cno cname cstate climit baldue
2 RAMA AP 6000 1200
4 GOPI AP 1200 550
Vertical fragmentation: It refers to the division of a table into attribute (column) subsets.
Each subset is stored at a different node and
each fragment has unique columns with the exception of the key column which is common to
all fragments.
(For Ex: Suppose a company is divided into 2 departments --- service and collection
department
Each department is in separate building and has interest in only some of the CUSTOMER
attributes.
Vertical fragmentation of CUSTOMER table
Fragment Name Location Node Name Attribute Names
V1 Service building SVC cno, cname, cstate
V2 Collection building ARC cno, climit, baldue
Vertically fragmented table contents
Table name: V1 location : Service building Node:SVC
cno cname cstate
1 ANU TN
2 RAMA AP
3 RADHA TN
4 GOPI AP
Table name: V2 location : collection building Node:ARC
cno climit baldue
1 3500 2700
2 6000 1200
3 4000 3500
4 1200 550
Mixed fragmentation: It refers to a combination of horizontal vertical strategies.
It requires two step process
1. horizontal fragmentation is introduced
2. vertical fragmentation is used within each horizontal fragment
(The XYZ company’s structure requires
CUSTOMER data to be fragmented horizontally to 2 company locations (TN,AP) and
Within locations, the data must be fragmented vertically to 2 departments (Service and
Collection)
Page 79
Mixed Fragmentation of CUSTOMER table
Fragment
Name
Location Horizontal Criteria Node Name Resulting
Rows at Site
Vertical criteria
attributes at each
fragments
M1 TN cstate = TN CHES 1,3 cno,cname,cstate
M2 TN cstate = TN CHEC 1,3 cno,climit,baldue
M3 AP cstate = AP VJAS 2,4 cno,cname,cstate
M4 AP cstate = AP VJAS 2,4 cno,climit,baldue
Table fragmentation after mixed fragmentation process
Data replication: It refers to the storage of data copies at multiple sites.
Replicated data are subject to the mutual consistency rule , which requires that
1. All copies of data fragments be identical.
2. DDBMS must ensure that a database update is performed at all sites where replicas exist.
Benefits of replication
Fragment copies can be stored at several sites
to serve specific information requirements can
• Increases data availability and response
time
• Reduced communication and query
costs.
• Better load distribution
• Improved data failure tolerance and
Disadvantages
• It imposes additional DDBMS processing overhead because
each copy must be maintained by the system and also have to decide which replicated
copy to use.
• Increased Transaction time as data must be updated at several sites.
• Storage Cost
Replication Conditions
A fully replicated database stores multiple copies of all database fragments at multiple sites.
A partially replicated database stores multiple copies of some database fragments at multiple
sites.
An unreplicated database stores each database fragment at a single site.
Page 80
Table name: M1 location : Tamil Nadu
Node:CHES
cno cname cstate
1 ANU TN
3 RADHA TN
Table name: M2 location : Tamil Nadu
Node:CHEC
cno climit baldue
1 3500 2700
3 4000 3500
Table name: M3 location : AndhraPradesh
Node:VJAS
cno cname cstate
2 RAMA AP
4 GOPI AP
Table name: M3 location : AndhraPradesh
Node:VJAC
cno climit baldue
2 6000 1200
4 1200 550
Factors for Data Replication Decision
1. Database Size : Replicating large amount of data will have impact on
storage requirements,Data transmission cost and Higher network bandwidth
2. Usage Frequency
How frequently the data need to be updated and how big is the database.
Frequently used data needs to be updated more often.
Data allocation - deciding where to locate data.
Data allocation strategies are as follows:
With the centralized data allocation, the entire database is stored at one site.
With partitioned data allocation, the database is divided into two or more disjoint parts and
stored at two or more sites.
With replicated data allocation, copies of one or more database fragments are stored at several
sites.
Data allocation algorithms take into consideration a variety of factors:
1. Performance and data availability goals
2. Size, number of rows, the number of relations that an entity maintains with other entities.
3. Types of transactions to be applied to the database, the attributes accessed by each of those
transactions.
Explain about Client/Server Architecture
Client/server architecture refers to the way in which computers interact to form a system.
It features a user of resources or a client and a provider of resources or a server .
The architecture can be used to implement a DBMS in which the client is the transaction
processor (TP) and the server is the data processor (DP).
Client/Server Architecture
Client/Server Advantages
1. Client/server solutions are less expensive and
allow the end user to use the microcomputer’s graphical user interface (GUI),
thereby improving functionality and simplicity.
2. There are more people with PC skills than with mainframe skills.
3. Numerous data analysis and query tools exist to allow interaction with many of the
DBMSs.
4. It is cheap to develop an application for PCs than for mainframes.
Client/Server Disadvantages
1. The client/server architecture creates a more complex environment with different
platforms.
2. An increase in the number of users and processing sites often paves the way for security
problems.
3. The burden of training increases the cost of maintaining the environment.
Page 81
Unit -5 Chapter -1 Business Intelligence and Data Warehouses.
Why there is a need for data analysis?
Or
What are DecisionSupportSystems and what role do they play in business environment.
Ans:
DecisionSupportSystem is an arrangement of computerized tools used to assist
managerial decision making within a business.
Organizations tend to grow and prosper as they gain better understanding of their
environment.
Data analysis can provide information about short-term tactical evaluations such as
Are our sales promotion working?
What market percentage are we controlling?
Are we attracting new customers?
Tactical and strategic decisions are also shaped by constant pressure from external and
internal forces, including globalization, the cultural and legal environment and technology.
Business climate is dynamic, and thus mandates their prompt reaction to change in order to
remain competitive.
Different managerial levels require different decision support needs.
For ex: TPS based on operational databases are tailored to serve the information needs of
people who deal with short term inventory, accounts payable and purchasing.
Middle level managers, general managers, vice presidents and presidents focus on strategic
and tactical decision making that require a DSS.
Differences between Operational and Decision Support Data characteristics
Characteristic Operational Data Decision Support Data
Data Current Operations
Real- time data
Historic Data
Snapshot of company data
Time component
(week/month/year)
Granularity Atomic-detailed data Summarized data
Summarization
Level
Low; some aggregate yields High; many aggregation levels
Data model Highly normalized
Mostly Relational DBMS
Non-normalized
Complex structures
Transaction type Mostly updates Mostly query
Transaction
Volumes
High update volumes Periodic loads and summary
calculations
Transaction Speed Updates are critical Retrievals are critical
Query Activity Low to medium High
Query Scope Narrow range Broad range
Query Complexity Simple to medium Very complex
Data Volumes Hundreds of megabytes upto
gigabytes
Hundreds of gigabytes upto
terabytes
The many differences between operational data and decision support data are good indicators
of the requirements of the decision support database.
Decision Support Database Requirements
There are four main requirements for a decision support database.
1. The Database Schema: must support complex (non-normalized) data representations.
Page 82
2. Data Extraction and Filtering: The data extraction capabilities should also support
different data sources and multiple vendors. Using data from multiple external sources
also usually means having to solve data formatting conflicts. Finally, to filter and
integrate the operational data into decision support database.
3. End- User Analytical Interface: The decision support DBMS must generate the
necessary queries to retrieve the appropriate data from decision support database.
4. Database Size: To support very large databases (VLDBs), the DBMS might be
required to use advanced hardware, such as multiple disk arrays, multiple-processor
technologies such as symmetric multiprocessor(SMP) or a massively parallel
processor (MPP).
What is data warehouse? Discuss about the properties of a data warehouse?
Ans: Data warehouse is an integrated, subject oriented, time variant, non volatile collection
of data that provides support for decision making.
The following are important properties of a data warehouse.
Integrated:
The data warehouse is a centralized, consolidated database that integrates data derived from
the entire organization and from multiple sources with diverse formats.
Data integration implies that all business entities, data elements, data characteristics and
business metrics are described in the same way throughout the enterprise.
Subject oriented:
Data warehouse data are arranged and optimized to provide answer to questions coming from
diverse functional areas within a company.
Data warehouse data are organized and summarized by topic. .
Instead of storing a INVOICE table, data warehouse stores its “sales by product” and “sales
by customer” components.
Time variant:
Warehouse data represent the flow of data through time.
Once data are periodically uploaded to the data warehouse, all time dependent aggregations
are recomputed.
Non Volatile:
Once data enter the data warehouse, they are never removed. Because data are never deleted
and new data are continually added the data warehouse is always growing.
The ETL process in the creation of data warehouse
Page 83
What is Business Intelligence?
Business Intelligence is a framework that allows a business to transform data into
information, information into knowledge and knowledge into wisdom.
The following are the BI architectural components.
1. Data extraction, transformation and loading (ETL) tools : this component is in charge
of collecting, filtering, integrating and aggregating operational data to be saved into a
data store.
2. Data Store: the data store is optimized for decision support and generally represented
by a data warehouse or a data mart.
3. Data Query and Analysis Tools: this component performs data retrieval, data analysis
and data mining tasks using the data in the data store represented in the form of an
OLAP tool.
4. Data presentation and visualization tools: this component is in charge of presenting
the data to the end user.
What are datamarts?
Ans: A data mart is a small, single - subject data warehouse subset that provides decision
support to a small group of people. Instead of creating a data warehouse for entire
organization, manageable data sets that are targeted to meet the special needs of small groups
within the organization are created. These smaller data store are called data marts.
What is OLAP? What are the four main characteristics of OLAP systems?
Ans: Online analytical processing create an advanced data analysis environment that
supports decision making, business modeling and operation research.
OLAP systems share four main characteristics:
1. They use multidimensional data analysis techniques.
2. They provide advanced data anlysis support.
3. They provide easy - to - use end user interface.
4. They support client/server architecture.
Multidimensional Data Analysis Techniques: In multi dimensional analysis, data are
processed and viewed as part of a multidimensional structure. This is useful in business
decision making because decision makers tend to view business data as data that are related
to other business data.
These techniques are augmented by the following functions:
1. Advanced data presentation functions such as 3- D graphics, 3-D cubes etc. Such
facilities are compatible with desktop spreadsheets etc.
2. Advanced data aggregation, consolidation and classification functions: create
multiple data aggregation levels, slice and dice data and drill down and roll up data
across different dimensions and aggregation levels
3. Advanced computational functions: These include business oriented variables
(market share, sales margins etc..) and financial accounting ratios and statistical and
forecasting functions. These functions are provided automatically.
4. Advanced data modeling functions like linear programming and other modeling
tools
Advanced database support.
OLAP tools must have advanced data access features such as
• Access to many different kinds of DBMSs
• Access to aggregated data warehouse data as well as detail data found in operational
databases
• Advanced data navigation features such as drill down and roll up.
• Rapid and consistent query response times.
Page 84
Easy to use end user interface:OLAP features become more useful when access to them is
kept simple. OLAP tool vendors have included easy-to-use graphical interfaces.
Explain about OLAP architecture?
OLAP operational characteristics can be divided into 3 main modules.
• Graphical User Interface (GUI)
• Analytical Processing Logic
• Data Processing Logic
OLAP System
The OLAP System exhibits..
* Client/Server Architecture
*Easy-to-use GUI
Dimensional Presentation
Dimensional Modeling
Dimensional analysis
* Multidimensional Data
Analysis
Manipulation
Structure
*Database Support
Datawarehouse
* Dimensional Operational database
* Aggregated Relational
* Very Large Database Multi dimensional
Above figure illustrates that OLAP systems are designed to use both operational and data
warehouse data.
Above figure shows that the OLAP system components are located on a single computer. One
problem with installation shown above is that each data analyst must have a powerful
computer to store the OLAP system and perform all data processing locally. Each analyst
uses a separate copy of the data. Therefore, the data copies must be synchronized to ensure
that analysts are working with same data.
OLAP Server arrangement:
Here OLAP gui runs on client
workstation while OLAP
engine or server runs on a
shared computer and this forms
a middle layer. The OLAP
server will accept and
processes the data processing
requests generated by the many
end user analytical tools. The
end- user GUI may be a plug-
in module integrated with
Excel, Lotus 1-2-3etc..
Page 85
OLAP GUI
Analytical Processing
Logic
Data Processing Logic
Operational
Data
Data Warehouse
* Integrated
* Subject-
Oriented
* Time- variant
* Nonvolatile
Why data warehouse when OLAP provides the necessary multi dimensional data
analysis of operational data?
Ans: Because the data warehouse handles the data component more efficiently than OLAP
does.
What is ROLAP?
Relational online analytical processing provides OLAP functionality by using relational
databases and familiar relational query tools to store and analyze multi dimensional data.
ROLAP adds the following extentions to traditional RDBMS technology.
• Multi dimensional data Schema support within the RDBMS : uses Star Schema
• Data access language and query performance optimized for multi dimensional
data.
Uses bitmapped indexes as they are efficient at handling large amounts of data than
the indexes used in RDBMS. Bitmapped indexes are primarily used in situations
where the number of possible values for an attribute is fairly small.
• Support for very large databases (VLDBs)
What is MOLAP?
Multi dimensional online analytical processing provides OLAP functionality to
Multidimensional database management.
MDBMS uses special proprietary techniques to store data in matrix-like n-dimensional
arrays.
MDBMS end users visualize the stored data as a 3-D cube known as data cube
The location of each data value in the data cube is a function of the x-,y-,z-axes in a 3D
space.
The x-,y-,z-axes represent the dimensions of the data value.
Hypercube is a data cube grown to n- dimensions
Because the data cube is predefined with a predefined number of dimensions, the addition of
a new dimension requires that the entire data cube be recreated and this process is time
consuming.
Differences between ROLAP and MOLAP
Multi dimensional data analysis requires some type of multidimensional data representation,
which is normally provided by the OLAP engine.
Whatever the arrangement of the OLAP components, multi dimensional data must be used.
Page 86
Discuss about Star Schema Architecture?
Ans:The star schema is a data modeling technique used to map multidimensional decision
support data into relational database.
The basic star schema has four components: facts, dimensions, attributes and attribute
hierarchies
Facts: Facts are numeric measurements that represent a specific business aspect or activity.
Ex: units, costs, prices.
Facts are normally stored in a fact table that is center of the star schema. The fact table
contains facts that are linked through their dimensions.
Facts computed or derived at run time are called metrics
Dimensions: provide descriptive qualifying characteristics
about the facts through their attributes.
For ex: sales might be compared by product from region to
region and from one time period to the next.
Dimensions are stored in dimension tables. The figure shows
star schema for sales with product, location and time
dimensions.
Attributes: Each dimension table contains attributes.
Attributes are often used to search, filter or classify facts.
Possible attributes for Location dimension are Region, State,
City, Store etc...
Possible attributes for Product dimension are Product Type,
Product ID, brand, package, presentation, color, size and
Possible attributes for Time dimension are Year, Quarter, Month, Week , Day, Time of day
and so on
Attribute hierarchies: Attributes with in dimensions can be ordered in a well defined
attribute hierarchy. The attribute hierarchy provide a top- down data organization that is used
for two main purposes : aggregation and drill down/roll-up data analysis.
Star Schema Representation
Ans: The fact table is related to
each dimension table in a (M:1)
relationship
i.e, many fact rows are related to
each dimension row
and so the primary key of the fact
table is composite primary key.
As per the figure, each sales record
represents each product sold to a
specific customer, at a specific time
and in a specific location
DBMS that is optimized for
decision support first searches the
smaller dimension tables before
accessing
the larger fact tables.
Page 87
Performance improving techniques for the star schema
Ans: Four techniques are often used to optimize data warehouse design:
Normalizing dimensional tables: the resulting schema with normalized dimension
tables is called snowflakes schema.
Maintaining multiple fact tables to represent different aggregation levels.
Denormalizing fact tables.
Partitioning and replicating tables.
What is data mining? Explain its various phases.
Ans: Data mining tool automatically search the data for anomalies and possible relationships,
thereby identifying problems that have not yet been identified by the end user.
Data mining is very helpful in finding practical relationships among data that help define
customer buying patterns, improve product development and acceptance, reduce health care
fraud, analyze stock markets etc..
Data mining is subject to four general phases:
Data preparation phase : data sets to be used by the data mining operation are identified
and cleansed.
Data analysis and classification phase: identifies common data characteristics or patterns.
The data mining tool applies specific algorithms to find
Data groupings, classifications, clusters or sequences
Data dependencies, links or relationships
Data patterns, trends and deviations
Knowledge Acquisition phase: selects the appropriate modeling or knowledge acquisition
algorithms to generate a computer model that reflects the behavior of the target data set.
Prognosis phase: In this phase, the data mining findings are used to predict future behavior
and forecast business outcomes.
To project the likely outcome of new product rollout or a new marketing promotion.
What are indictive or intelligent databases?
Ans: The databases that not only store data and various statistics about data usage, but also
have the ability to learn about and extract knowledge from the stored data.
Page 88
Explain about SQL extensions for OLAP?
Ans: The following are important SQL extensions for OLAP
The ROLLUP extension: is used with the GROUP BY clause to generate aggregates by
different dimensions.
Syntax:
SELECT column1, column2 [,…],aggregate_function(expression)
FROM table1, [table2,…]
[WHERE condition]
GROUP BY ROLL UP(column1,column2[,…])
[HAVING condition]
[ORDER BY column1[,column2,…]]
The order of the column list within the GROUP BY ROLL UP is very important. The
last column in the list will generate a grand total. All other columns will generate
subtotals.
The CUBE extension:
Is used to compute all possible subtotals within groupings based on multiple dimensions.
The CUBE extension will enable to get a subtotal for each column listed in the expression
and grand total for the last column listed.
Syntax: SELECT column1, column2 [,…],aggregate_function(expression)
FROM table1, [table2,…]
[WHERE condition]
GROUP BY CUBE(column1,column2[,…])
[HAVING condition]
[ORDER BY column1[,column2,…]]
Materialized views:
A materialized view is a dynamic table that not only contains the SQL query command to
generate the rows, but also stores the actual rows. The materialized view created the first time
the query is run and summary rows are stored in the table. The materialized view row are
automatically updated when the base tables are updated.
Syntax:
CREATE MATERIALIZED VIEW view_name
BUILD { IMMEDIATE | DEFERRED}
REFRESH {[FAST | COMPLETE | FORCE]} ON COMMIT
[ENABLE QUERY REWRITE]
AS select_query
The BUILD clause indicate when the materialized view rows are actually populated.
IMMEDIATE indicates rows are populated right after the command is entered.
DEFFERED indicates rows are populated at a later time. Until then view will be in an
unusable state.
The REFRESH clause lets you indicate when and how to update the view when new rows are
added to base tables.
FAST indicates updates only affected rows.
COMPLETE indicates a complete update will be made for all rows in materialized view.
FORCE indicates that the DBMS will first try to do a FAST update, otherwise it will do a
COMPLETE update.
ON COMMIT indicates the updates to the materialized view will take place as a part of the
commit of the DML transaction that updated the base tables.
ENABLE QUERY REWRITE option allow DBMS to use the materialized views in query
optimization.
Page 89
Unit –V Chapter-II Database Administration And Security
Q: Explain the need for and role of database in an organization?
Ans: The DBMS helps a organization in many ways:
Interpretation and presentation of data
Distribution of data and information to the right people
Data preservation and monitoring the data usage
Control over data duplication and use
At the top management level, the database role is:
Provide the information necessary for strategic decision making.
Provide access to external and internal data
Provide a framework for defining and enforcing organization policies
At the middle management level, the database role is:
Deliver the data necessary for tactical decision and planning
Monitor and control the allocation and use of company resources.
Providing a framework for enforcing and ensuring the security and privacy of the data in the
database.
At the operation management level, the database role is:
Represent and support the company operations as closely as possible.
Produce query results with in specified performance levels.
Enhance the company’s short term operational ability.
Q: The evolution of the database administration function?
Ans: The cost of data and managerial duplication in decentralized and old file system gave
rise to centralized data administration function known as electronic data processing (EDP) or
data processing (DP) department.
DP resolves data conflicts created by the duplication and/or misuse of data.
The advent of the DBMS and its shared view of data produced a new level of data
management and led the DP department to evolve into information systems (IS) department.
The responsibilities of IS department are
A service function to provide end user with active data management support.
A production function to provide end users with specific solutions for their information
needs.
As the number of databases grew, data management became increasingly complex, thus
leading to the development of database administration function.
The person responsible for the control of the centralized and shared database became known
as the database administrator (DBA).
Devise administration Strategy Has responsibility and authority to plan,
define,
No authority to enforce it implement and enforce the policies, standards
and
Page 90
DBA function
Staff Position Line Position
No authority to resolve conflicts procedures used in data administration
activity.
The fast-paced changes in DBMS technology dictate changing organization styles. For
example
Distributed Databases can force to decentralize the data administration function
Internet accessible data and growing number of data warehousing applications are likely to
add to the DBA’s data modeling and design activities.
The new microcomputer environment required the DBA to develop a new set of technical
and managerial skills.
Functions of DBA
DBA function by dividing the
DBA operations according to
DBLC phases
DBA function requires
personnel to cover the
following activities.
Several different and incompatible DBMSs installed to support different operations.
There may also be
variety of
microcomputer
DBMSs installed in
different
departments. In
such an
environment, the
company might
have one DBA
assigned for each
DBMS. The
general coordinator of all DBAs is known as System administrator.
Differentiate between the responsibilities of data administrator (DA) and Database
Administrator (DBA)?
Ans: The DA is responsible for controlling the overall corporate data resource, both
computerized and manual. Thus the DA’s job description covers a larger area of operations
than that of the DBA because the DA is in charge for controlling not only the computerized
data, but also the data outside the scope of the DBMS.
Data Administrator(DA) Database Administrator (DBA)
1. Does strategic planning controls and supervises
2. Sets long-term goals Executes plans to reach goals
3. Sets policies and standards Enforces policies and procedures
Enforces programming standards
Is broad in scope Is narrow in scope
Focuses on the long term Focuses on the short term (daily operations)
Has a managerial orientation Has a technical orientation
Is DBMS-independent Is DBMS-specific
Page 91
Page 92
Q: What are desired DBA skills? OR Discuss the abilities and responsibilities of DBA
Ans: The DBA skills can be divided into two categories managerial and technical and
summarized in the following table.
Managerial Technical
Broad Business
understanding
Broad data-processing background
Coordination skills Systems development life cycle knowledge
Analytical skills Structured Methodologies:
Data flow diagrams
Structures, Charts
Programming languages
Conflict resolution skills Database life cycle knowledge
Communication skills (oral
and written)
Database modeling and design skills
Conceptual
Logical
Physical
Negotiation Skills Operational skills: database implementation, data dictionary
management, security and so on
Experience : 10 years in a large DP department
Responsibilities (roles of DBA)
The DBA’s Managerial Role:
The DBA delivers services such as
End-User Support: These include
• Gathering User Requirements
• Building end-user confidence
• Resolving conflicts and problems
• Finding solutions to information needs
• Ensuring quality and integrity of data and applications
• Managing the training and support of DBMS user
Policies, Procedures and Standards: The DBA must define, document, and communicate the
policies, procedures and standards before they can be enforced.
Policies are general statements or action
Example: All users must have passwords
Passwords must be changed every six months.
Standards are rules that are used to evaluate the quality of the activity.
Example: A password must have a minimum of five characters.
A password must have a maximum of twelve characters.
Procedures are written instructions that describe a series of steps to be followed during the
performance of a given activity
Example: to create a user account
1. the user sends a written request to DBA.
2. the DBA approves the request and forwards it to computer operator
3. the operator creates the account and assigns a temporary password and sends to the
user
4. a copy is sent to DBA
5. user changes temporary password to permanent one.
The DBA must define, communicate and enforce procedures that cover areas such as
1. End-user database requirements gathering
2. database design and modeling
3. documentation and naming conventions
Page 93
4. design, coding and testing of database application programs
5. database software selection
6. database security and integrity
7. database backup and recovery
8. database maintenance and operation
9. end-user training
Data Security, privacy and integrity DBA must use security mechanisms provided by DBMS
and also must team up with internet security experts to safeguard data from possible attacks
or unauthorized access.
Data Backup and Recovery: The Backup and recovery measures must include at least:
• Periodic data and application backups
• Proper backup identification
• Convenient and Safe backup storage.
• Physical protection of both hardware and software
• Personal access control to software of a database installation
• Insurance coverage for the data in the database.
Data distribution and Use: The DBA is responsible for ensuring that the data are distributed
to the right people at right time and in right format.
The DBA’s technical role
The DBA’s technical role requires a broad understanding of DBMS’s functions,
configuration, programming languages, data modeling and design methodologies and so on.
The technical aspects of the DBA’s job
• Evaluating, selecting and installing the DBMS and related utilities:
To match DBMS capability to organization’s needs, the DBA must check the
following features in DBMS
 DBMS model DBMS storage capacity Application
development support
Security and integrity Back up and Recovery Concurrency Control
Performance Database administration tools Interoperability and
data distribution
Portablity and standards Hardware Data dictionary
Vendor training & support Available third-party tools Cost
• Designing and implementing Databases and Applications: The DBA has to review the
database applications design to ensure that transaction are
Correct: the transaction mirror real world events
Efficient: the transaction do not overload the DBMS.
Compliant: complies with integrity rules and standards.
• Testing and evaluating databases and applications: The evaluation process should
cover all technical aspects of both the applications and the database. This process has
to enforce all data validation rules.
• Operating the DBMS utilities and applications: DBMS operations are divided into
four main areas:
System support
Performance monitoring and tuning
Back up and recovery
Security auditing and monitoring.
Page 94
• Training and supporting users: Training people to use the DBMS and its tools is
included in DBA’s technical activities.
• Maintaining the DBMS utilities and applications: periodic DBMS maintenance
includes management of the physical or secondary storage devices. Maintenance
activities also include upgrading the DBMS and utility software.
The DBA’s role as an arbitrator between data and users.
The DBA also verifies that programmer and end-user access meets the required quality and
security standards.
Data base users might be classified by the
Type of decision making support required (operational, tactical or strategic)
Degree of computer knowledge (novice, proficient or export)
Frequency of access (casual, periodic or frequent)
The DBA must be able to interact with all of those people and understand their needs.
Q: What are the various database administration tools? Explain.
Ans:
Data dictionary: Data dictionary is defined as “a DBMS component that stores definition
of data characteristics and relationships”. The data dictionary resembles an x-ray of the
company’s entire dataset, and it is a crucial element in data administration.
Two main types of data dictionaries exists: integrated and standalone.
An integrated data dictionary is included with the DBMS. The DBA may use third party
standalone data dictionary.
Data dictionaries can also be classified as active or passive.
An active data dictionary is automatically updated by DBMS with every database access,
thereby keeping its access information up to date. A passive data dictionary is not updated
automatically and usually requires running a batch process.
The DBA can use the data dictionary to support data analysis and design.
For ex, the DBA can create a report that lists all data elements to be used in a particular
application, a list of all users who access a particular program, a report that checks data
redundancies.
CASE tools: CASE is acronym for computer aided systems engineering. A CASE tool
provides an automated framework for the SDLC.
Uses structured methodologies and graphical interfaces.
CASE tools are usually classified according to the extent of support they provide for the
SDLC.
For ex: Front-end CASE tools provide support for planning, analysis and design phases.
Back-End CASE tools provide support for the coding and implementation phases.
Following are the benefits of CASE tools.
1. A reduction in development time and costs.
2. Automation of the SDLC
3. Standardization of systems development methodologies.
4. Easier maintenance of application system developed with CASE tools.
A typical CASE tool provides five components:
1. Graphics designed to produce structured diagrams such as data flow diagrams, ER
diagrams, class diagrams etc.
2. Screen painters and report generators.
3. An integrated repository for storing and cross referencing the system design data.
4. An analysis segment to provide fully automated check on systems consistency, syntax and
completeness.
5. A program documentation generator.
Page 95
Q: Explain the usage of ORACLE for database administration?
Ans: To perform any administrative task, you must connect to the database using a username
with administrative (DBA) privileges. By default ORACLE automatically created SYSTEM
and SYS user id that have administrative privileges with every new database you create.
Creating tablespaces and data files:
In ORACLE a database is logically composed of one or more tablespaces. A tablespace is a
logical storage space.
The tablespace data are physically stored in one or more datafiles. ORACLE automatically
creates the tablespace and data files.
The following are examples.
1. the SYSTEM tablespace is used to store the data dictionary data.
2. the USERS tablespace is used to store the table data created by the end users.
3. the TEMP tablespace is used to store the temporary tables and indexes created during
the execution of SQL statements.
4. the UNDOTBS1 tablespace is used to store database transaction recovery
information.
Managing the database objects: tables, views, triggers and procesures
The ORACLE enterprise manager gives the DBA a graphical user interface to create, edit,
view and delete database objects in the database. A database object is basically any object
created by end users.
Managing users and establishing security:
One of the most common database administration activities is creating and managing
database users.
The security section of the ORACLE enterprise manager’s administration page enables the
DBA to create users, roles and profiles.
1. A user is a uniquely identifiable object that allow a given person to login to the
database
2. A role is a named collection of database access privileges that authorize a user to
connect to the database and use the database system resources.
3. A profile is a named collection of settings that control how much of the database
resource a given user can use.
Customizing the database initialization parameters:
Fine tuning a database is another important DBA task. This task usually requires the
modification of database configuration parameters. Some of which can be changed in real
time using SQL commands.
Each database has an associated database initialization file that stores its run-time
configuration parameters. The initialization file is read at instance start up and is used to set
the working environment for the database.
Creating a new database:
Using the ORACLE database configuration assistant, it is simple to create a database. The
DBA uses a wizard interface to answer a series of questions to establish the parameters for
the database to be created. This process creates the database structure, including the necessary
data dictionary tables, the administrator, users accounts and other supporting process required
by the DBMS to manage the database.
Page 96
Responsibilities of DBA: The DBA is responsible for:
Designing the logical and physical schemas, as well as widely-used portions of the
external schema.
Security and authorization.
Data availability and recovery from failures.
Database tuning: The DBA is responsible for evolving the database, in particular
the conceptual and physical schemas, to ensure adequate performance as user
requirements change.
A DBA needs to understand query optimization even if s/he is not interested in running his or
her own queries because some of these responsibilities (database design
and tuning) are related to query optimization. Unless the DBA understands the per-formance
needs of widely used queries, and how the DBMS will optimize and execute
• these queries, good design and tuning decisions cannot be made
Page 97
Page 98
What is data warehouse? Discuss about
the properties of a data warehouse?
Ans: Data warehouse is an integrated, subject
oriented, time variant, non volatile collection
of data that provides support for decision
making.
The following are important properties of a
data warehouse.
Integrated:
The data warehouse is a centralized,
consolidated database that integrates data
derived from the entire organization and from
multiple sources with diverse formats.
Data integration implies that all business
entities, data elements, data characteristics
and business metrics are described in the
same way throughout the enterprise.
Subject oriented:
Data warehouse data are arranged and
optimized to provide answer to questions coming from diverse functional areas within a
company.
Data warehouse data are organized and summarized by topic. .
Instead of storing a INVOICE table, data warehouse stores its “sales by product” and “sales
by customer” components.
Time variant:
Warehouse data represent the flow of data through time.
Once data are periodically uploaded to the data warehouse, all time dependent aggregations
are recomputed.
For ex: once data for previous weekly sales are uploaded, the weekly, monthly, yearly and
other time dependent aggregates for products, customers, stores and other variables are also
updated.
Once the data enters the data warehouse, the time ID assigned to the data cannot be changed.
Non Volatile: Once data enter the data warehouse, they are never removed. Because data are
never deleted and new data are continually added the data warehouse is always growing.
Page 99
Page 100
Page 101
Page 102

Dbms

  • 1.
    UNIT-I Chapter-I : DATABASESYSTEMS Data: Data consists of raw facts, which the computer stores and reads. Data can consist of letters, numbers, sounds or images etc. that have some meaning in the user environment. Data are the raw material from which information is generated. Information: When data has been processed to give it more meaning, it is called as information. Database: An organized collection of logically related data usually designed to meet the information needs of multiple users in an organization. Database Management System: (DBMS) is a software tool used to define, create, maintain and provide controlled access to the database. DBMS software stores data structures, relationship between those structures and the access paths to those structures in a central location. Q) How the data is organized within a database? Ans: To help you visualize how a database stores data, think about a typical address book. Fields: Each field contains a specific type of information such as first name, last name, phone number, email etc… Records: Record is a collection of related fields. Ex: All information about one person in an address book. Tables: A complete collection of records makes a table Ex: Contacts table FirstName LastName Company Address City State Pincode Record1 Record2 Q) Why the database is important? Ans: If you keep list of all your business customers in a database, you can • You can sort the customers by pincode. • Create a simple onscreen entry form that even your technically unskilled employee can use successfully You can manipulate data in almost anyway you want. Files, File Systems & Problems With File System Data Management. Manual filing system works well when the number of items stored is quite small and they are only needed to be stored and retrieved. A manual filing system crashes when cross referencing and processing of information in the files is carried out. Limitations or disadvantages of File Processing Systems. Program data dependence: File descriptions are stored within each program that access a given file. In the invoicing system program access both the inventory pricing file and the customer master file. Page 1
  • 2.
    Therefore this programcontains a detailed description for both these files.In the below figure both the customer master file is contained in both the order filing system and invoicing system. Suppose it is decided to change the customer address field length in the records in customer master file from 20 to 30 characters. For this, each related program have to be modified. Duplication of data or Redundancy of data: in the below figure, order filing system contains the inventory master file, the invoicing system contains inventory pricing file. Inventory master, inventory pricing file contains product descriptions and quantity. There is duplication of data which requires additional storage space. Orders Department Accounting Department Inconsistent data: The redundancy in storing the same data multiple times leads to data inconsistency when an update is applied to some of files but not to other. Limited data sharing: In the file processing system, users have little opportunity to shared data outside their applications. Lengthy development times: Developing an application by using the file systems is very skilled activity. The programmers has to write many programs for supporting file opening, file closing and iterative logic for representing operations, this is very lengthy process. Incompatible file formats: Since the structure of files is embedded in application, the structure is dependent on application programming languages. Ex: structure of file generated by COBOL is different from ‘C’ programming language. The application programmer has to develop software to convert the files to some common format for processing. This may be time consuming and expensive. Fixed Queries: Any query or report needed by organization has to be developed by the application programmer. Lack of security: All users could see all data and no security and authorization subsystem. No recovery and back up system:Data could be lost in case of hardware or software failure. All the data is stored in disk files and accessed according to access methods (sequential, direct etc..) provided by file system and chosen by application programmer. Page 2 Progra m A Progra m B Progra m C Order File System Customer Master File Inventory Master File Back Order File Progra m A Progra m B Invoicing System Inventory Pricing File Customer Master File
  • 3.
    Order Programs Accounting Programs Payroll Programs DBMS Database Customer master data Inventorymaster data Employee master data Back order data Q: What is a database system and What are the advantages of database systems? Database System: Database and DBMS software together is called a database system. Program data independence: DBMS allows certain types of changes to the structure of the database without affecting the stored data and the existing application. Improved data sharing: The DBMS helps create an environment in which end users have better access to more data and better managed data. Improved data security: The DBA uses security and authorization subsystem provided by DBMS to create accounts and to specify account restrictions. The DBMS will enforce these restrictions automatically. Better Data Integration: DBMS promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency. Minimized data inconsistency: Data inconsistency is also reduced in a properly designed database as such a database doesn’t allow different versions of same data in different places. Ex: company’s sales department stores salesman name as ‘Bill Brown’ and the same person name is stored as ‘William G Brown’ in company’s HR department. Improved Data access: A query is a question or specified request issued to DBMS for data manipulation. Example of query language is SQL An Adhoc query is a spur of the moment question. The DBMS sends back an answer ( called the query result set) to the application For Ex: How many of our cutomer have balances of Rs. 3000 or more? The DBMS gives quick answers to adhoc queries. Improved Decision Making: Better managed data and improved data access makes it possible to generate better quality information on which better decisions are based. Increased end- user productivity: The availability of data and tools that transform data into information allows end user to make quick decisions that can make the difference between success and failure in global economy. Page 3
  • 4.
    Database system environment: Databasesystem environment is made up of five major parts. They are hardware, software, people, procedures and data. Hardware: Hardware refers to all of the system’s physical devices. For ex: computers (micro computers, servers etc..), storage devices, printers, networking devices (hubs, switches etc…) and other devices ( ATMS, ID readers etc..) Software: Three types softwares are needed to make the database system function fully. 1. Operating system software: manages all hardware components and makes it possible for all other software to run on the computers. Ex: UNIX, Microsoft windows. 2. DBMS Software: manages the data within the database system. Ex: SQL server, Oracle, DB2, My SQL 3. Application programs and utility software: are most commonly used to access data found within the database to generate reports, tabulations and other information for decision making. For Ex: All DBMS vendors provide GUI’s to create database structures, control database access and monitor database operations. 4. People includes all users of database system. On the basis of their job functions, five types of users can be identified. 1. System administrators: looks after database system general operations. 2. Database administrators (DBA) manages the DBMS and ensures that the database is functioning properly. 3. Database Designers or Database architects design database structure. The determination of what data are to be entered into the database and how the data are to be organized is an important part of database designer’s job. 4. System analysts and programmers design and implement the application programs. They design and create the data entry screens, reports and procedures through which end users access and manipulate the database’s data. 5. End users are the people who use the application programs to run the organisation’s daily operations. For Ex: clerks, managers, supervisors and directors. High level end users uses the information obtained from the database to make decisions. 5. Procedures: Procedures play an important role in a company. They enforce the standards by which the business is conducted within the organization and with customers. Procedures are also used to ensure that there is an organized way to monitor and audit both the data that enter the database and the information generated through the use of that data. 6. Data: are the raw materials from which information is generated. Data covers the collection of facts stored in the database. DBMS Functions: DBMS performs several functions that guarantee the integrity and consistency of the data in database. They are 1. Data dictionary management: The DBMS stores definitions of the data elements and their relationships in a data dictionary. The DBMS provide data abstractions and it removes structural and data dependency from the system. 2. Data Storage management: The DBMS creates and manages the complex structures required for data storage, thus you need not define and program the physical data characteristics. It also provide storage for on-screen definitions, report definitions, data validation etc.. 3. Data transformation and presentation: The DBMS must manage the data in proper format for each country while entering dates, names etc... must not allow different Page 4
  • 5.
    versions of samedata in different places. Ex: company’s sales department stores salesman name as ‘Bill Brown’ and the same person name is stored as ‘William G Brown’ in company’s HR department 4. Security Management: DBMS creates a security system that enforces user security and data privacy. Security rules determine which users can access the database, which data items each user can access? , which data operations (read, add, delete or modify) the user can perform. This is important in multi user database system. 5. Multi user access control: The DBMS uses sophisticated algorithms to ensure that multiple users can access the database at the same time. 6. Back up and recovery management: DBMS provide special utilities that allow DBA to perform back up and restore procedures. Recovery management deals with the recovery of database after a failure, such as bad sector in disk or power failure. 7. Data integrity management: DBMS promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency. 8. Data access languages and application programming interfaces: The DBMS provide access through a query language. A query language is a non procedural language that lets the user specify what must be done without having to specify how it is to be done. Example of query language is SQL 9. Database communication interface: DBMS accepts end user requests from multiple, different network environments. Disadvantages of DBMS 1. Increased Costs: Database system requires hardware, software and highly skilled people. The cost of maintaining these. 2. Management Complexities: The database system hold important data that are accessed from multiple sources, security issues may occur. 3. Frequent Updates: must perform frequent updates an d apply latest patches and security measures to all components. These increases personnel training costs. 4. Vendor Dependence: due to heavy investment in technology and personnel training, companies do not change database vendors. As a result, vendors donot offer pricing point advantages to existing customers. 5. Frequent Updates / Replacement Cycles: DBMS vendors frequently upgrade their products by adding new functionality i.e, upgrade versions of software. Some of these versions require hardware upgrades and training to users costs money. Page 5
  • 6.
    UNIT-I Chapter-II Data modelingand data models Q: What is a data model? Ans: Data model is blue print containing all the instructions to build a database that will meet all the end –user requirements. This blue print contains both text descriptions in plain, unambiguous language and clear useful diagrams depicting the main data elements. Q: Explain The importance of data models? Ans:Data models are communication tool that enables interaction among the designer, the application programmer and end user. Data models are used to represent real world data and how the different degrees of data abstraction enables data modeling. Ex: a house blue print is an abstraction; you cannot live in a blue print, Similarly the data model is an abstraction, you cannot draw the required data out of the data model. As you cannot build a perfect house without blue print, you cannot create a good database without creating an appropriate data model. Q:What are Business Rules? Ans: Business rule is a description of policy, procedure within a specific organization. Properly written business rules are used to define entities, attributes, relationships and constraints. Example1: Consider 2 business rules • A customer may generate many invoices. • An invoice is generated by only one customer. These business rules establish 2 entities (CUSTOMER and INVOICE) and a 1:M relationship. Example 2: A business rule is as follows • A training session cannot be scheduled for <10 employees or for >30 employees This rule establishes a constraint (not <10 employees or for >30 employees) , two entities (EMPLOYEE and TRAINING) and a relationship between these entities. Q: How to Discover Business Rules Ans: The main sources of business rules are • company managers, • policy makers, • department managers and • written documentation such as a company’s procedures, standards or operation manuals, • direct interviews with end users. The process of identifying and documenting business rules is essential to database design for several reasons. • They help standardize the company’s view of data • They allow designer to develop relationship participation rules and constraints and to create an accurate data model. Q: Why not all business rules can be modeled? Ans: For ex: No pilot can fly more than 10 hours within 24- hour period . Such a business rule can be enforced by application software and not by database design. Page 6
  • 7.
    Q: Explain abouthierarchical model? Ans: Its structure is represented by an upside – down tree. The hierarchical structure contains levels or segments. Within the hierarchy, the top layer (also called root) is the parent of the segment directly beneath it. Advantages: 1. It promotes data sharing. 2. Parent child relationship promotes conceptual simplicity and data integrity. 3. Database security is provided and enforced by DBMS. 4. It is efficient with 1:M relationships. Disadvantages: • Complex to implement and difficult to manage as it requires knowledge of physical data storage characteristics. • Can implement only 1:M relationships. So it has implementation limitations. • No standards. • No DDL and DML language in the DBMS. • Lacks structural independence. Changes in structure require changes in all application programs. • No adhoc queries • Access paths predefined This technology is best applied when conceptual model also resembles a tree and most data accesses begin with the same root file. Q: Explain about network model? Ans: Network model allows a record to have more than one parent. Advantages: • It can handle M:N and multi parent relationship types. • Data access is more flexible • There are standards defined to implement this model. • It includes DDL and DML commands in DBMS Disadvantages: • Little data independence. • Lacks structural independence. Changes in structure require changes in all application programs. • No adhoc queries • Access paths predefined Q: What is CODASYL and DBTG? Ans: To help establish database standards, the conference on data systems languages (CODASYL) created Database Task Group (DBTG) in late 1960s. The final DBTG report contained specifications for 3 crucial database components. The schema is the conceptual organization of the entire database as viewed by DBA The subschema defines the portion of the database as seen by the application programs. The application programs invoke the subschema required to access the appropriate database file. A data management language that defines the environment in which data can be managed. Page 7
  • 8.
    Q: Explain aboutThe Relational Model ? Ans: Here tables are called as “Relations” Rows are called “Tuples” and column names as “attributes”. Every attribute has a domain. A domain is set of permissible values that can be given to an attribute. A common attribute existing in any two tables creates a relationship between the tables. It supports relationship types (1:1, 1: M or M: N) The RDBMS manages all the physical details, while the user sees the relational database as collection of tables. (it enables you to view data logically rather than physically.) The RDBMS uses SQL to translate user queries into instructions for retrieving the required data. The SQL engine executes all queries. Advantages • Promotes data and structural independence. • Tabular view improves conceptual simplicity. • Adhoc query capability is based on SQL • RDBMS isolates end user from physical level details. Disadvantages: • RDBMS requires substantial hardware and software overhead. • Conceptual simplicity gives untrained people the tools to use good system poorly. • It may produce islands of information problems as individuals and departments can easily develop their own applications. Q: Explain about The Entity Relationship Model? Ans: ER models are normally represented in an entity relationship diagram (ERD) The ER model is based on the following components: Entity: Entity is anything about which data are to be collected and stored Attribute: Attributes are characteristics of entities. Relationship:A relationship is an association between entities. Advantages: • Visual modeling yields conceptual simplicity. • Visual representation makes it an effective communication tool. • It can be integrated with dominant relational model. Disadvantages • There is limited constraint representation. • There is limited relationship representation. • There is no data manipulation language. Q: Explain the various notations used with ERDs ? Ans: The various notations used with ERDs are • The chen notation favors conceptual simplicity. • The crow’s foot notation favors implementation – oriented approach. • The UML notation can be used for both conceptual and implementation modeling. Q: Explain about Object Oriented model? Ans: In this model both the data and their relationships are contained in a single structure known as an Object. Page 8
  • 9.
    Object includes informationabout relationships between facts within the object and relationships with other objects. The OODM is the basis of OODBMS The OODM is said to be semantic data model because semantic indicated meaning. The object oriented data model is based on the following components • An object is an abstraction of a real-world entity. • Attributes describe the properties of an object. • Objects that share similar characteristics are grouped in classes. • A class is a collection of similar objects with shared structure (attributes) and behaviour (methods) (where as entities do not have methods) • Classes are organized in class hierarchy (which represents an upside – down tree in which each class has only one parent) • Inheritance is the ability of an object within class hierarchy to inherit the attributes and methods of the classes above it. Object oriented data models are depicted using UML diagrams. Advantages: • Semantic content is added • Visual representation includes semantic content. • Inheritance promotes data integrity. Disadvantages: • No widely accepted standard. • It is a complex navigational system. • There is a steep learning curve. • High system overhead slows transaction. Q) Distinguish between Logical and Physical data independence. Logical Data Independence: Logical data independence is the ability to modify the conceptual schema without having alteration in external schemas or application programs. Alterations in the conceptual schema may include addition or deletion of fresh entities, attributes or relationships and should be possible without having alteration to existing external schemas or having to rewrite application programs. Physical Data Independence: Physical data independence is the ability to modify the internal schema without having alteration to the conceptual schemas or application programs. Alteration in the internal schema might include. * Using new storage devices. * Using different data structures. * Switching from one access method to another. * Using different file organizations or storage structures. * Modifying indexes. Page 9
  • 10.
    Explain about theConceptual, Internal and external and Physical Model (Or) Explain about different levels of data abstraction (Or) Explain about three schema architecture. Ans: The Conceptual Model 1. The conceptual model represents a global view of the organization’s data as viewed by all end-users. 2. It describes all entities and their attributes, the relationships among these entities and the constraints on these relationships. 3. The conceptual model forms the basis for the conceptual schema - a description of the database structure. 4. The conceptual model is independent of both software (DBMS and OS) and hardware. 5. The E-R model is the most widely used to represent conceptual model The Internal Model Page 10
  • 11.
    1. The internalmodel adapts the conceptual model to a specific DBMS (e.g., hierarchical, network, and relational). 2. The internal model is software-dependent but hardware-independent. 3. Development of the internal model is especially important to hierarchical and network database models. The External Model 1. The external model is the end user’s/ applications programmer’s view (local view) of the database . 2. It is concerned about a specific business operation. 3. It is implemented through the CREATE VIEW command in SQL. Benefits of the external model • Application program development is simplified because the programmer does not have to be concerned about data not relevant to his/her application. • Communication with the end-user is simplified. • Identification of data required to support each business operation is simplified. • Access control and security can be easily implemented. Page 11
  • 12.
    The Physical Model •The physical model operates at the lowest level of abstraction, describing the way data is stored on storage media such as disks or tapes. • It requires the definition of physical storage devices and the access methods required to reach the data. • The physical model is both software and hardware-dependent. Page 12
  • 13.
    UNIT-I Chapter-III The RelationalDatabase model Explain characteristics of relational table? 1. A table is perceived as a two-dimensional structure composed of rows and columns. 2. Each table row (tuple) represents a single entity occurrence within the entity set. 3. Each table column represents an attribute, and each column has a distinct name. 4. Each row/column intersection represent a single data value. 5. All values in a column must conform to the same data format. 6. Each column has a specific range of values known as the domain of that attribute. Example: The domain for the gender attribute consists of only two possibilities: M or F. The domain for a company’s date of hire attribute consists of all dates (from start up date to current date) Attribute may share a domain. For ex: a student address and a professor address share the same domain of all possible addresses. 7. The order of rows and columns is immaterial to the DBMS 8. Each table must have an attribute or a combination of attributes that uniquely identifies each row. Ex: Roll_No in the STUDENT table What are data types support by most DBMS? Ans: The different data types are 1. Numeric: Numeric data are data on which you can perform arithmetic operations. 2. Character: Character data or text data or string data can contain any character, symbol or digit not intended for mathematical manipulations. 3. Date: Date attributes contain calendar dates stored in special format known as the julian date format. Logical: Logical data can have only a true or false (yes or no) condition. What is data dictionary? Ans: The data dictionary provides detailed descriptions of all tables and so contains all of attributes names,characteristics and structure of each table in the system. What is system catalog? Ans: it is a detailed system data dictionary that describes all objects within the database, including data about table names, table’s creator etc.. The system catalog is a system – created database whose tables store the user created database characteristics and content. These tables can be queried just like user-defined table. Explain about indexes in relational database? Ans: An index is composed of an index key and a set of pointers. An index can be used to retrieve data more efficiently. When you define a table’s primary key, the DBMS automatically creates a unique index on the primary key columns. Page 13
  • 14.
    What is meantby functional dependence? Ans: The attribute B is functional dependent on A if each value in column A determines one and only one value in column B. Ex: What is composite key? Ans: A key may be composed of more than one attribute. Such a multi- attribute key is known as a composite key. What is meant by fully functional dependency? Ans: If attribute B is functionally dependent on a composite key A but not on any subset of that composite key, the attribute B is fully functionally dependent on A. Explain about various keys used in relational database model? Key Type Definition Example Super key An attribute (or combination of attributes) that uniquely identifies each row in a table. In STUDENT table, the super key could be any of the following: STU_NUM STU_NUM, STU_LNAME Candidate key A minimal (irreducible) super key is a candidate key. A super key that does not contain a subset of attributes that is itself a super key. STU_NUM,STU_LNAME is a super key, but it is not a candidate key because STU_NUM by itself can uniquely identifies each row in the STUDENT table. Primary key A candidate key is selected as a primary key. It cannot contain NULL values If employee’s PAN number has been included as one of the attribute in the EMPLOYEE table. EMP_NUM and EMP_PAN are both candidate keys because both uniquely identifies each employee. Selection of EMP_NUM as primary key would be designer’s choice. Secondary key An attribute or combination of attributes used strictly for data retrieval purposes Most of the time if I need city wise customers list from CUSTOMER table, I can place a secondary key on CUS_CITY column to get a speed reply. Foreign key An attribute in one table whose values must either match the primary key in another table or be null. Q: What is a constraint? Write short notes on integrity constraints/ rules with example? Ans: A constraint is a restriction placed upon the data values that can be stored in a column or columns of a table. Integrity Constraint are of 2 types 1. Entity integrity constraint 2. Referential integrity constraint Entity integrity : All primary key entries are unique and no part of a primary key may be null. Referential integrity: A foreign key may have either a null entry, as long as it is not part of its tables primary key or an entry that matches the primary key value in a table to which it is related. (Every non- null foreign key value must reference an existing primary key value.) Page 14
  • 15.
    Example: Table name:AGENT Primary key: AGENT_CODE Foreign Key: none AGENT_CODE AGENT_FNAME AGENT_PHONE A01 ANU 2475258 A02 RAM 2465258 Table Name: CUSTOMER Primary Key: CUS_CODE and Foreign Key: AGENT_CODE CUS_CODE CUS_FNAME AGENT_CODE C01 SWATHI NULL C02 DOLLY A01 C03 RAMA A01 Here the customer swathi is not assigned a agent yet, hence the agent code is NULL. No entry in agent code column in customer table has invalid entry as they reference a valid entry A01 which is anu’s agent code. Also primary keys of both tables contain null values and has unique values. Relational set operators or relational algebra Relational algebra is set of basic operations used to manipulate the data in relational model. These operations can be classified into two categories: 1. Basic set operations: These are When two or more tables share • the same number of columns and • the columns have the same names and • the columns share the same (or compatible) domains the Two tables are said to be union-compatible. UNION: union combines all rows from two tables, excluding duplicate rows. The two tables must be union- compatible. Example: R3=R1U R2 R1 R2 Page 15 Binary operations UNION INTERSECTION SET DIFFERENCE CARTESIAN PRODUCT Relational operations SELECT PROJECT JOIN DIVISION Fname A1 A2 A3 A4 A7
  • 16.
    yields UNION Intersect: Intersect yieldsonly the rows that appear in both tables. The tables must be union –compatible to yield valid results. Page 16 Fname A1 A2 A3 A4 Fname A1 A7 A2 A4
  • 17.
    Example: R1 R2 R3=R1n R2 yields INTERSECT Difference: Difference yields all rows in one table that are not found in the other table. The tables must be union-compatible. Example: R1 R2 R1-R2 =R3 yields DIFFERENCE Cartesian Product: yields all possible pairs of rows from two tables. R3=R1X R2 R1 R2 yields Page 17 Fname A1 A2 A4 Fname A1 A2 A3 A4 Fname A1 A7 A2 A4 Fname A3 Fname A1 A2 A3 A4 Fname A1 A7 A2 A4 Course Fname C1 A1 C1 A2 C1 A3 C2 A1 C2 A2 C2 A3 Course C1 C2 Fname A1 A2 A3
  • 18.
    Select: Also knownas RESTRICT Yields values for all rows found in a table that satisfy a given condition. PRODUCT SELECT only price < $10 yields Project: yields all values for selected attributes. Project yield a vertical subset of a table. PRODUCT PROJECT Price yields Join: A join is used to combine rows from multiple tables. Natural Join links tables by selecting only the rows with common values in common columns. A natural join is a result of a three-stage process. 1.a PRODUCT of the tables is created. 2. a SELECT is performed on the output to yield only the rows for which Acode = Agent_code and these common columns Acode, Cus_code are called as join columns. 3. a PROJECT is performed on the result to include only one join column. Table name: CUSTOMER Table name: AGENT STEP1: product of the above 2 tables yields Cus_code Name Agent_code Acode Name C01 ANU A01 A01 RAJ C01 ANU A01 A02 TAJ Page 18 Pcode Pdesc Price 1 Flash Light 5 2 Lamp 25 3 Battery 7 4 100W Bulb 15 Pcode Pdesc Price 1 Flash Light 5 3 Battery 7 Price 5 25 7 15 Pcode Pdesc Price 1 Flash Light 5 2 Lamp 25 3 Battery 7 4 100W Bulb 15 Cus_code Name Agent_code C01 ANU A01 C02 RANI A02 Acode Name A01 RAJ A02 TAJ
  • 19.
    C02 RANI A02A01 RAJ C02 RANI A02 A02 TAJ STEP 2: SELECT rows for which Acode = Agent_code Cus_code Name Agent_code Acode Name C01 ANU A01 A01 RAJ C02 RANI A02 A02 TAJ STEP 3: PROJECT to remove Acode field from the result. Cus_code Name Agent_code Name C01 ANU A01 RAJ C02 RANI A02 TAJ The column on which the join occurs only once in new table. Equi Join: 1. Links tables on the basis of equality condition. 2. Does not eliminate duplicate columns Theta join : if any other comparison operator other than equality is used, the join is called theta join. Left outer join: yields all of the rows in CUSTOMER table, including those that do not have a matching value in AGENT table. Right outer join: yields all of the rows in AGENT table, including those that do not have a matching value in CUSTOMER table. DIVIDE: This operation uses single column table as the divisor and 2-column table as the dividend. The tables must have a common column. DIVIDE YIELDS Page 19 CODE LOC A 5 A 9 B 5 B 3 C 6 CODE A B LOC 5
  • 20.
    STUDENT STUDENT RollNo STU_LNAME STU_FNAME STU_PHONE UNIT-II Chapter–I Entity Relationship modeling Q: What are E-R Model Components or modules? Ans:Three components: Entities, Attributes, and Relationships. Entity: Entity is anything about which data are to be collected and stored. An entity may be concrete (a person or a book, for example) or abstract (like a holiday or a concept). An entity is represented by a rectangle containing entity’s name. The entity name , a noun, is usually written in all capital letters. Attribute: Attributes are characteristics of entities. For ex: STUDENT entity has the attributes STU_FNAME,STU_PHONE etc. Attributes are represented by ovals and are connected to the entity rectangle with a line. Each oval contains the name of the attribute it represents. Attributes may share a domain. Primary keys are underlined. (here RollNo is the primary key.) Relationship A relationship is an association between entities. Relationships are described as verbs. Relationships are represented by diamond-shaped symbols Q: What are the different Types Of Attributes: Ans: 1. Required and Optional Attributes: Required attribute is an attribute that must have a value, it cannot be left empty. Ex: STU_FNAME, STU_FNAME Optional attribute is an attribute that does not require a value, it can be left empty. Ex: STU_PHONE…all students may or may not have a phone at home. 2.Composite and Simple attributes: A simple attribute cannot be subdivided. Examples: Age, Sex, and Marital status A composite attribute can be further subdivided to yield additional attributes. Examples:ADDRESS into Street, City, State, Zip PHONE NUMBER into Area code, Exchange number 3. Single-Valued and Multivalued Attributes: A single-valued attribute can have only a single value. Examples: A manufactured part can have only one serial number. A multivalued attribute can have many values. Multivalued attributes are shown by a double line connecting to the entity Examples: i) A person may have several college degrees. ii)A household may have several phones with different numbers 4. Derived Attribute and Stored Attribute A derived attribute is not physically stored within the database; its value is computed from other attributes. It is indicated using a dotted line connecting the attribute with the entity. Example: AGE can be derived from DOB and current date. Page 20
  • 21.
    What is Cardinality? Ans: Cardinality expresses the minimum and maximum number of entity occurrences associated with one occurrence of the related entity. In the ERD, cardinality is indicated by placing appropriate numbers beside the entities, using the format (x,y). The 1st value represents the minimum number of associated entities, while the 2nd value represents the maximum number of associated entities. These implemented by the application software or by triggers. Q:When can you say an entity is Existence dependent/ independent? Ans: An entity is said to be existence dependent if it can exist in the database only when it is associated with another related entity occurrence. Existence independence: if an entity can exist independently, then it is said to be existence dependent. Q:What is relationship strength? Explain about strong and weak relationships. Ans: Relationship Strength is based on how the primary key of a related entity is defined. They are of 2 types. Weak (Non-identifying) relationship: a weak relationship also known as non-identifying relationship, exists if the entity has a primary key that is not partially or totally derived from the parent entity in the relationship Strong relationship also known as identifying relationship, exists if the entity has a primary key that is partially or totally derived from the parent entity in the relationship What is Weak entity? Ans: A weak entity is one that meets two conditions 1. The entity is existence- dependent. 2. The entity has a primary key that is partially or totally derived from the parent entity in the relationship i.e, Strong relationship. A weak entity id identified by using a double-walled entity rectangle. Ex: DEPENDENT is the weak entity in the relationship EMPLOYEE has DEPENDENT. What is meant by relationship participation? Ans: Participation in an entity relationship is either optional or mandatory. Optional participation means that one entity occurrence does not require a corresponding entity occurrence in a particular relationship. For Ex: in the “COURSE generates CLASS” relationship, there are some courses that do not generate a class. Therefore, the CLASS entity is considered to be optional to the COURSE entity. Mandatory participation means that one entity occurrence require a corresponding entity occurrence in a particular relationship If every COURSE must generate a CLASS then the CLASS entity is considered to be mandatory to the COURSE entity. Types of Relationships A relationship’s degree indicates the number of entities that participate in the relationship. Different types of relationship degrees are : 1. Unary relationship : If an relationship is maintained within a single entity then such relationship is called unary relationship. Example: an employee within the EMPLOYEE entity is the manager for one or more employees within that entity. Page 21
  • 22.
    when an entityhas a relationship with itself then such relationship is called as recursive relationship. 2. Binary Relationship: Binary Relationship exists when two entities are associated in a relationship. Ex: the relationship “a PROFESSOR teaches one or more CLASSes” What is a recursive relationship? Ans: when an entity has a relationship with itself then such relationship is called as recursive relationship. What is an associative or composite or bridge entity? Ans: When there is M:N relationship between two entities then we create a new entity called bridge/composite entity that contains the primary keys of both the entities participating in M:N relationship Ex: Explain database design challenges? Ans: 1. Design Standards: Standards guide one in developing logical structures that reduce data redundancies. Without design standards, it is not possible to design a proper design or evaluate an existing design. 2. Processing Speed: high processing speed are top priority in database design as high processing speed are necessary for many organizations for example: a perfect design must use a 1:1 relationship to avoid nulls, while a higher transaction –speed design might combine the two tables to avoid the use of an additional relationship, using dummy entries to avoid nulls. If the focus is on data-retrieval speed, one must include derived attributes in design. 3. Information requirements: a design that meets all logical requirements is an important goal. The designer should consider end-user requirements such as performance, security, shared access. He must also verify that all update, retrieval and deletion options are available and also all query and reporting requirements. Page 22
  • 23.
    UNIT-III Chapter-I Introduction toSQL Q:What is SQL and What does SQL do? SQL stands for structured query language. SQL is non-procedural language, therefore you specify what is to be done rather than how is it done. American National Standards Institute (ANSI) prescribed a standard SQL. SQL functions fits into two broad categories: • It is a data definition language:(DDL):-SQL can create databse objects such as tables,indexes and views.SQL can also define access rights to these database objects. • It is a data manipulation language(DML):-SQL can be used to insert,update,delete and retrieve data from the database SQL is easy to learn SQL can retrieve data from database SQL can execute queries SQL queries are used to answer question and also to perform actions such as adding,deleting table rows. Q:Explain various datatypes available in SQL? AnsThe following table shows some common SQL datatypes Datatype Format Comments Numeric NUMBER(L,D) Ex: NUMBER(7,2) indicates number will be stored with two decimal places and may be upto 7 digits long,including the sign and decimal places. INTEGER (OR) INT Cannot be used if you want to store numbers that require decimal places. SMALLINT Limited to integer values upto six digits DECIMAL(L,D) Greater lengths are acceptable, but smaller ones are not. Character CHAR(L) Fixed length character data for upto 255 characters. If you store strings that are not as long as the CHAR parameter value,the remaining spaces are left unused VARCHAR(L) OR VARCHAR2(L) Variable length character data will not leave unused spaces. Date DATE Stores dates in the julian date format. Q: Explain how to create table using SQL? Ans: The CREATE table is used to create a new table in the user database schema. Syntax: CREATE TABLE tablename ( Column1 datatype(column width) [constraints], Column2 datatype(column width) [constraints], …………… ); Page 23
  • 24.
    Example: CREATE TABLE VENDOR( vnonumber(3) PRIMARY KEY, vname varchar2(35) NOT NULL, vcity varchar2(15)); If the above command is executed successfully, the message “table created “ is displayed. The following are the rules for naming a table. 1. Table names should start with an alphabet 2. Underscores,numbers and letters are allowed but not blank spaces. 3. Maximun length of table name is 30 characters. 4. Reserved words of ORACLE cannot be used as table name. 5. Two different table should not have the same name. 6. Unique column names should be specified. 7. Proper data types and size should be specified. Q: What are SQL constraints? Explain? Ans: Entity integrity is enforced automatically when the primary key is specified in CREATE TABLE command. For Ex: CREATE TABLE PRODUCT( pno char(3), pdesc varchar2(35) NOT NULL UNIQUE, p_indate date, qoh number(5), price number(5), vno number(3), PRIMARY KEY(pno), FOREIGN KEY(vno) REFERENCES VENDOR ON UPDATE CASCADE); The primary key attribute contains both a NOT NULL and a UNIQUE specification. The foreign key constraint definition ensures that • You cannot delete a vendor from VENDOR table if atleast one PRODUCT row references that VENDOR. • ON UPDATE CASCADE (not supported by ORACLE) ensures that when a change is made in VENDOR table, that change will be reflected automatically in PRODUCT table. Besides the primary key and foreign key constraints, the ANSI SQL standard defines the following constraints. • NOT NULL ensures that a column will not have null values. • UNIQUE ensures that a column will not have duplicate values. • DEFAULT defines a default value for a column(when no value is given). • CHECK validates data in an attribute and sees that a specified condition exists. Ex1: The minimum order must be atleast 10 Ex2:The date must be after APRIL 15, 2011 The CREATE TABLE command lets you define constraints in two different places. • When you create the column definition (known as column constraint) • When you use CONSTRAINT keyword (known as table constraint) A column constraint applies to just one column. A table constraint may apply to many columns. Page 24
  • 25.
    Q:Explain important datamanipulation commands (DML) of SQL? Ans: INSERT: Used to enter data into a table. Syntax: INSERT INTO tablename VALUES (value1,value2,…..valuen) Example: INSERT INTO VENDOR VALUES (100,’RADHA’,’VJA’); Observe that: Character and date values must be entered between apostrophes(‘). Numerical entries are not enclosed in apostrophes(‘). Attribute entries are separated by commas. Inserting Rows with NULL attribute INSERT INTO product VALUES (‘P02’,’PENCIL’,’02-AUG-2011’, 25, 3, NULL); Note that the NULL entry is accepted only because the vno attribute is optional in PRODUCT table. The NOT NULL declaration is not used in the CREATE TAVLE statement for these attributes. Inserting Rows with OPTIONAL attributes: If the data is not available for all columns, then column list must be included following table name. INSERT INTO product(pno,pdesc) VALUES(‘P03’,’MOUSE’) COPYING PARTS OF A TABLE To create a new table based on selected column and rows of an existing table. In this case, the new table will copy the attribute names,data characteristics and rows of original table. CREATE TABLE part AS SELECT pno,pdesc,vno FROM product; Note that no entity integrity(primary key) or referential integrity (foreign key) rules are automatically applied to the new table. Saving the table changes or COMMIT: The COMMIT command permanently saves all changes such as rows added, attributes modified and rows deleted made to any table in the database. Syntax: COMMIT; Any changes made to table contents are not saved on disk until you close the database, close the program you are using, or use the COMMIT command. UPDATE Command: The UPDATE command modifies an attribute value in one or more table rows. Allows you to make data entries in an existing row’s columns. Syntax: UPDATE tablename SET columnname = expression [,columnname = expression] WHERE conditionlist; Ex:To change the p_indate of product with pno P01 to 02-AUG-2011. UPDATE PRODUCT SET p_indate =’02-AUG-2011’ Page 25
  • 26.
    WHERE pno=’P01’; Restoring tablecontents or ROLLBACK: ROLLBACK-undoes any changes since the last COMMIT command and brings the data back to the values that existed before the changes were made. Syntax: ROLLBACK; Ex: 1. Create table called sales. 2. Insert 10 rows in sales table. 3. Execute the ROLLBACK command ROLLBACK will undo only the result of INSERT and UPDATE commands. All data definition commands(CREATE TABLE) are automatically committed to data dictionary and cannot be rolled back. DELETE Command DELETE -deletes one or more rows from a table If you do not specify a WHERE condition , all rows from table will be deleted. REMOVAL OF SPECIFIED ROW(S): Syntax: DELETE FROM tablename [WHERE conditionlist]; REMOVAL OF ALL ROWS: Syntax: DELETE FROM tablename; Viewing data in tables or SELECT SELECT-lists the contents of a table. Syntax: SELECT columnlist FROM tablename [WHERE conditionlist]; The columnlist represents one or more attributes separated by commas. You can use the * wildcard character to list all attributes. Ex1: SELECT * FROM PRODUCT; Ex2: SELECT pdesc,p_indate FROM product WHERE pno=’P01’; Ex3:SELECT * FROM product WHERE p_indate>’01-AUG-2011’; The SELECT statement retrieves all rows that match the specified condition. WHERE clause adds conditional restrictions to SELECT statement. The condition list is represented by one or more conditional expressions separated by logical operators. Comparison operators can be used to restrict output. Comparison operators: Symbol Meaning Example = Equal to SELECT * FROM product WHERE pno=’P01’; < Less than SELECT * FROM product WHERE price<10; <= Less than or equal to > Greater than SELECT * FROM product WHERE price>10; >= Greater than or equal to <> or ! = Not equal to SELECT * FROM product WHERE vno <> 100; Using Computed Columns Page 26
  • 27.
    Oracle uses actualformula text as the label for the computed column. Ex: SELECT pno,qoh*price FROM PRODUCT; Result: Using Column aliases An alias is an alternative name given to a column or table in any SQL statement. Ex2: SELECT pno,qoh*price AS total FROM PRODUCT; Using Date arithmetic SYSDATE is a special function that returns today’s date. Ex:1 SELECT pno,p_indate,p_indate+90 AS ExpiryDate FROM product; Ex:2 SELECT pno,p_indate,SYSDATE-90 AS CutDate FROM product WHERE p_indate<=SYSDATE-90 The output would change based on today’s date Arithmetic Operators: Symbol Meaning Example + Addition - Subtraction * Multiply SELECT qoh, price*qoh FROM product; / Division ^ Raised to power(some applications uses ** instead of ^) Rules of Precedence: Perform operations within parentheses then perform ^ then *,/ then +,- Logical Operators: SQL allows you to have multiple conditions in a query through the use of logical operators. Symbol Meaning Example AND Both conditions must match SELECT * FROM product WHERE price > 10 AND price < 100; OR Either condition must match SELECT * FROM product WHERE vno = 100 OR vno = 101 NOT Do not match a certain condition SELECT * FROM product WHERE NOT(vno = 100) Display the result when all the condition specified using the AND operator are satisfied Display the result when Either of the condition specified using the OR operator are satisfied NOT operator is used to find rows that do not match a certain condition. It negates the result of conditional expression Ex: SELECT * FROM product WHERE ( price < 50 AND p_indate > ’01-AUG-2011’) OR vno = 100; The rows of vno=100 are included regardless of p_indate and price of those rows. Special Operators BETWEEN operator: Used to check whether an attribute value is within a range Ex: To see list of products whose price is between $10 and $100, use the command: Page 27 pno qoh*price P01 PEN P02 PENCIL pno total P01 PEN P02 PENCIL
  • 28.
    SELECT * FROMproduct WHERE price BETWEEN 10 AND 100; IS NULL operator: Used to check whether an attribute value is null. Ex: To list all the products that do not have a vendor assigned, use the command: SELECT * FROM product WHERE vno IS NULL; LIKE operator:Used only with char and varchar2. Matches a string pattern. Used in conjuction with wildcards to find patterns within string attributes. Ex1: To find all vendors whose name start with R SELECT * FROM vendor WHERE vname LIKE ‘R%’; To find all vendors whose name has ‘a’ as second letter. Ex2: SELECT * FROM vendor WHERE vname LIKE ‘_a%’; SQL allows you to use the percent sign (%) and underscore( _ ) wild card characters to make matches when the entire string is not known. Wildcard Meaning % Matches any characters _ Matches one characters Matches can be made when the query entry is written exactly like table entry. IN operator: matches any value within a VALUE list. uses an equality operator i.e, it selects only those rows that match(are equal to) atleast one of the values in the list Ex: SELECT * FROM product WHERE vno IN(100 , 101); All of the values in the list must be of same data type. Each of the values in the value list is compared to the attribute. IN operator is valuable when it is used in subqueries. SELECT * FROM vendor WHERE vno IN(SELECT vno FROM product ); Subquery (SELECT vno FROM product) will list all vendors who supply products. IN operator will compare the values generated by subquery to vno values in VENDOR table. EXISTS operator:checks whether subquery returns any row. If subquery returns any row, run the main query otherwise don’t. Ex: SELECT * FROM vendor WHERE (SELECT * FROM product WHERE qoh<=10); Modifying structure of table: ALTER Command: All changes to table structure are made using the ALTER command. Syntax: ALTER TABLE tablename {ADD|MODIFY} (columnname datatype [{ADD|MODIFY} columnname datatype]); To Change column’s datatype Page 28
  • 29.
    To change thevname datatype from varchar2 to char ALTER TABLE vendor MODIFY (vname char(35)); To Change column’s data characteristics To increase the width of vname column to 55 characters ALTER TABLE vendor MODIFY (vname char(35)); To add a column ALTER TABLE product ADD (pmin number(5)); If the table already has some data , we cannot add new column with NOT NULL as existing rows will default to NULL for the new column. TO ADD TABLE CONSTRAINTS: Syntax: ALTER TABLE tablename ADD constraint [ADD constraint]; To add primary key: ALTER TABLE part ADD PRIMARY KEY(part_no); To add foreign key: ALTER TABLE part ADD FOREIGN KEY(vno) REFERENCES vendor; (OR) ALTER TABLE part ADD PRIMARY KEY(part_no) ADD FOREIGN KEY(vno) REFERENCES vendor; To add primary and foreign key using the keyword CONSTRAINT: ALTER TABLE part ADD CONSTRAINT pk_partno PRIMARY KEY(part_no) ADD CONSTRAINT fk_vno FOREIGN KEY(vno) REFERENCES vendor; TO REMOVE A COLUMN OR TABLE CONSTRAINT Synax: ALTER TABLE tablename DROP{ PRIMARY KEY | COLUMN columnname | CONSTRAINT constraintname}; Dropping a column: deleting a column ALTER TABLE product DROP COLUMN pmin; DELETING A TABLE FROM DATABASE: A table can be deleted from the database using the DROP TABLE command. Syntax: DROP TABLE part; Advanced select queries ORDER BY clause: Orders the selected rows based on one or more attributes • Used in the last portion of select statement • By using this, rows can be sorted • By default it takes ascending order • DESC is used for sorting in descending order • Sorting by column which is not in select list is possible. • Sorting by column aliases Example: To produce a list of products sorted in descending order of their prices. SELECT pno,pdesc,p_indate,price FROM product Page 29
  • 30.
    ORDER BY priceDESC; A multilevel ordered sequence is known as cascading order sequence and it can be created easily by listing several attributes, separated by commas, after the ORDER BY clause. SELECT * FROM employee ORDER BY e_lname,e_fname,e_initial; DISTINCT clause: Used to eliminate duplicate rows. Ex:How many different vendors are currently represented in the PRODUCT table? SELECT DISTINCT vno FROM product; Explain Aggregate functions? Ans: Some of the aggregate functions are COUNT,MIN,MAX,AVG. COUNT: Uses one parameter within parantheses. COUNT(columnname)-Used to count the number of non-null values of an attribute COUNT(*) aggregate function is used to count number of rows returned by query, including the rows that contain nulls. Ex1: How many rows in PRODUCT table have a price value less than or equal to $500? SELECT COUNT(*) FROM product WHERE price<=500; Ex:2:How many vendors referenced in the PRODUCT table have supplied products with prices that are <+1? SELECT COUNT(DISTINCT vno) FROM product WHERE price<=10; MAX and MIN Ex1: Which product has highest price? SELECT * FROM product WHERE price = (SELECT MAX(price) FROM product); (Here we cannot use SELECT * FROM product WHERE price = MAX(price); because The aggregate functions can be used only in column list of a SELECT statement) Ex2:Highest price in PRODUCT table? SELECT MAX(price) FROM product; Ex:3Lowest price in PRODUCT table? SELECT MIN(price) FROM product; Ex4: To find out the product that has the oldest date? SELECT * FROM product WHERE price = (SELECT MIN(p_indate) FROM product); Ex5: To find out the most recent product. SELECT * FROM product WHERE price = (SELECT MAX(p_indate) FROM product); SUM: Computes total sum of any specified attribute. Ex:To find the total value of all items SELECT SUM(qoh*price) AS TOTALVALUE FROM product; AVG Ex1: To find the products whose prices exceed the average product price. SELECT * FROM product WHERE price > (SELECT AVG(price) FROM product) ORDER BY price desc; Explain about GROUP BY clause? • Used to group rows on basis of certain common attribute value such as employees of a department, products of a vendor. Page 30
  • 31.
    • WHERE clausecan be used ,if needed. • The only attributes that can be put in select clause are the aggregated functions and the attributes that have been used for grouping the information. Ex1:How many products are supplied by each vendor? SELECT vno, COUNT(pno) FROM product GROUP BY vno; Having clause: Extension of the GROUP BY feature is the HAVING clause. HAVING clause is applied to the output of GROUP BY operation. Ex: how many products supplied by each vendor.List only the products whose average is below $10 SELECT vno, COUNT(pno), AVG(price) FROM product GROUP BY vno HAVING AVG(price) < 10; Q: Explain about index in SQL Ans: Indexes are used to quickly access the data. Syntax: CREATE INDEX <index name> ON <tablename>(column name); An index can be created on one or more columns. Based on the number of columns included in index, an index can be of 2 types. 1. Simple index 2.Composite Index. To create Simple index: An index created on a single column is called simple index. Ex: CREATE INDEX p_in ON product(p_indate) To create composite index: An index created on a more than one column is called composite index. Dropping indexes or deleting an index: Use the DROP INDEX command. Ex: DROP INDEX p_in; Q:What is database schema? Ans:A schema is a group of database objects such as tables and indexes, that are related to each other. Syntax: CREATE SCHEMAAUTHORIZATION {creator} When a user is created, the DBMS automatically assigns schema to that user. Schemas are useful to group tables by owner and enforce a first level of security by allowing each user to see only the tables that belong to that user. Labwork: CREATE TABLE VENDOR( Page 31 VENDOR vn o vname vcity 10 0 RADHA VJA 10 1 ALIYA NULL 10 2 SIRI VJA 10 3 LAK GNT
  • 32.
    vno number(3) PRIMARYKEY, vname varchar2(35) NOT NULL, vcity varchar2(15)); CREATE TABLE PRODUCT( pno char(3), pdesc varchar2(35) NOT NULL UNIQUE, qoh number(5), price number(5), vno number(3), PRIMARY KEY(pno), FOREIGN KEY(vno) REFERENCES VENDOR); CREATE TABLE CUSTOMER( cno number(3) PRIMARY KEY, cname varchar2(35) , city varchar2(5), baldue number(5)); CREATE TABLE INVOICE( invno number(3), cno number(3), invdate date, PRIMARY KEY(invno), FOREIGN KEY(cno) REFERENCES CUSTOMER); Page 32 pno pdesc qoh price vno P01 PEN 2 10 100 P02 CD 20 12 101 P03 PENCIL 200 3 NULL P04 DVD 200 0 350 101 CUSTOMER cno cname city baldue 20 1 ANU VJA 100 20 2 ASHA GNT 500 20 3 RAJ VJA INVOICE invno cn o invdate 301 20 1 20-AUG-2011 302 20 2 20-AUG-2011 303 20 3 21-AUG-2011 304 20 1 21-AUG-2011
  • 33.
    All products soldare stored in LINE table CREATE TABLE LINE( invno number(3), lineno char(3), pno char(3), line_units number(5), line_price number(5), PRIMARY KEY(invno,lineno), FOREIGN KEY(pno) REFERENCES PRODUCT FOREIGN KEY(invno) REFERENCES INVOICE); CREATE TABLE EMP( e_lname varchar2(20), e_fname varchar2(20), e_initial varchar2(2), dob date, sal number(8,2)); Page 33 LINE invno lineno pno line_units line_price 301 L01 P01 10 10 301 L02 P02 10 12 301 L03 P03 20 3 302 L01 P01 30 10 302 L02 P02 20 12 303 L01 P01 35 10 303 L02 P02 15 12 EMP e_lname e_fname e_initial dob Sal REDDY SAM A 14-NOV- 1994 15000.25 NAIDU ANU S 14-OCT-1992 16234.50 JAIN NEHA K 28-NOV- 1993 15623.48 REDDY RAM T 14-SEP-1994 1623.89
  • 34.
    Unit –III Chapter–II ADVANCED SQL SQL data manipulation commands operate over entire table (ex: SELECT command lists all rows from the table you specified in FROM clause) and are said to be set oriented commands. UNION statement:combines rows from two or more queries without including duplicate rows. Syntax: query UNION query Query: SELECT cname,city FROM customer UNION SELECT cname,city FROM customer3 Combines ouput of two or more SELECT queries. (The select statements must be union – compatible.that is they must return the same attribute names and similar data types) without including duplicate rows. UNION ALL Combines ouput of two or more SELECT queries. (The select statements must be union – compatible.that is they must return the same attribute names and similar data types) and retains duplicate rows SELECT cname,city FROM customer UNION ALL SELECT cname,city FROM customer3 INTERSECT statement: used to combine rows from two queries , returning only the rows that appear in both sets. SELECT cname,city FROM customer INTERSECT SELECT cname,city FROM customer3 Page 34 CUSTOMER3 cno cname city baldue 40 1 JAY GNT 200 40 2 RAJ VJA 300 CUSTOMER cname city ANU VJA ASHA GNT RAJ VJA JAY GNT CUSTOMER cno cname city baldue 20 1 ANU VJA 100 20 2 ASHA GNT 500 20 3 RAJ VJA CUSTOMER cname city ANU VJA ASHA GNT RAJ VJA JAY GNT RAJ VJA CUSTOMER cname city RAJ VJA
  • 35.
    MINUS statement: combinesrows from two queries and returns only rows that appear in first set but not in the second. SELECT cname,city FROM customer MINUS SELECT cname,city FROM customer3 SQL JOIN OPERATORS: A join is used to combine rows from multiple tables and returns the rows with one of the following conditions: Join operations can be classified as inner joins and outer joins. The inner join is traditional join in which only rows that meet a given criteria are selected. The join criteria can be an equality condition (also called a natural join or an equijoin) or inequality condition( also called theta join) Generally a join condition will be equality comparison of the P.K of one table and F.K of related table An outer join returns not only matching rows but also unmatched attribute values from one table or both tables to be joined. Join specification Join Type SQL Syntax Example Description CROSS CROSS JOIN SELECT * FROM T1,T2 SELECT * FROM T1 CROSS JOIN T2; Returns the Cartesian product of T1 and T2(old style) Returns the Cartesian product of T1 and T2(old style) INNER Old-Style JOIN SELECT * FROM T1,T2 WHERE T1.C1=T2.C1; Returns only the rows that meet the join condition in the WHERE clause. NATURAL JOIN SELECT * FROM T1 NATURAL JOIN T2; Returns only the rows with matching values in the matching columns.The matching columns must have the same names and similar datatypes. JOIN USING SELECT * FROM T1 JOIN T2 USING(C1) Returns only the rows with matching values in the columns indicated in the USING clause JOIN ON SELECT * FROM T1 JOIN T2 ON T1.C1=T2.C1; Returns only the rows that meet the join condition in the ON clause OUTER LEFT JOIN SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.C1=T2.C1; Returns rows with matching values and include all rows from left table(T1) with unmatched values RIGHT JOIN SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.C1=T2.C1; Returns rows with matching values and include all rows from right table(T2) with unmatched values Page 35 CUSTOMER cno cname city 20 1 ANU VJA 20 2 ASHA GNT
  • 36.
    FULL JOIN SELECT * FROM T1FULL OUTER JOIN T2 ON T1.C1=T2.C1; Returns rows with matching values and include all rows from both table(T1 and T2) with unmatched values RECURSIVE JOIN (OR) SELF JOIN: An alias is an alternative name given to a column or table in any SQL statement. An alias is especially useful when a table must be joined to itself in a recursive query Ex: SELECT E.Eno,E.Ename,M.Ename FROM EMP E,EMP M WHERE E.Mgr=E.Eno; Cross Join:(also known as cartesian product) Examples: SELECT * FROM invoice CROSS JOIN line; The above query generates 4*7=28rows ( 4 rows in invoice table and 7 rows in line table) Natural Join: SELECT cno,cname,invno,invdate FROM customer NATURAL JOIN invoice; You are not limited to two tables when performing a natural join. It doesnot require a table qualifier for the common attribute. SELECT invno,pno,pdesc,line_units,line_price FROM invoice NATURAL JOIN line NATURAL JOIN product; JOIN USING clause It doesnot require a table qualifier for the common attribute. SELECT invno,pno,pdesc,line_units,line_price FROM invoice JOIN line USING(invno) JOIN product USING(pno); JOIN ON clause Do not require common attribute names in the joining tables. Requires a table qualifier for the common attribute. Lets you perform a join even when the tables do not share a common attribute name. SELECT invoice.invno,pno,pdesc,line_units,line_price FROM invoice JOIN line ON invoice.invno=line.invno JOIN product ON line.pno=product.pno; OUTER JOINS SELECT pno,vendor.vno,vname FROM vendor LEFT JOIN product ON vendor.vno=product.pno; SELECT pno,vendor.vno,vname FROM vendor RIGHT JOIN product ON vendor.vno=product.pno; SELECT pno,vendor.vno,vname FROM vendor FULL JOIN product ON vendor.vno=product.pno; SUBQUERIES: used when it is required to process data based on other processed data Characteristics of sub queries: A subquery or nested query or inner query is a query inside another query. A subquery is normally expressed inside parentheses The output of inner query is used as input for the outer(high-level) query. So inner query is executed first and then the outer query. Subquery is based on the use of the SELECT statement to return one or more values to another query. If the table into which you are inserting rows has one date attribute and one Page 36
  • 37.
    number attribute, theSELECT subquery should return rows in which 1st column has date values and 2nd column has number values. Inserting table rows with a select subquery or Copying parts of tables: It add multiple rows to a table, using another table as source of the data. CREATE TABLE PART( part_no char(3) PRIMARY KEY, part_desc varchar2(35), vno number(3)); Syntax: INSERT INTO target_tablename SELECT source_columnlist FROM source_tablename; Example: INSERT INTO part SELECT * FROM product; Both the tables(PART and PRODUCT) must have same attributes.The above query returns all rows from table PRODUCT. SELECT subquery Examples Explanation UPDATE product SET price=(SELECT AVG(price) FROM product ) WHERE vno=’100’………………Ex(2) Updates the product price to average product price for the products provided by vendor 100. DELETE FROM product WHERE vno IN(SELECT vno FROM vendor WHERE vcity=’VJA’) ……………..Ex(3) Delete the PRODUCT table rows that are provided by vendors with vcity=’VJA’ A subquery can return i. One value as in Ex(2) ( the select subquery returns avg(price) which is one value). ii. A list of values as in Ex(3) (the select subquery returns a list of vendors from ‘VJA’) iii. A virtual table iv. No value at all i.e, NULL . the output of the outer query might result in an error or a null empty set. WHERE subqueries Ans:Ex: Find all products with a price greater than or equal to the average product price, you write the following query. SELECT pno,price FROM product WHERE price>=(SELECT AVG(price) FROM product); Note that this type of query,when used in a >,<,==,>= or <= conditional expression, requires a subquery that returns only one single value.If the query returns more than a single value, the DBMS will generate an error. IN subqueries: Ans:Ex(2) HAVING subqueries Example:To list all products with the total quantity sold greater than the average quantity sold SELECT pno,SUM(line_units) FROM line GROUP BY pno HAVING SUM(line_units)>(SELECT AVG(line_units) FROM line); MULTIROW subquery operators: ANY and ALL 1. ALL:used to do an inequality comparison(> or <) of one value to a list of values. Example: What products have a product cost that is greater than all individual product costs for products provided by vendor with vno 101 SELECT pno, qoh*price FROM product WHERE qoh*price> ALL(SELECT qoh*price FROM product WHERE vno = 101); Page 37
  • 38.
    In the abovequery the ALL operator allows you to compare a single value(qoh*price) with a list of values returned by the subquery. 2. ANY: ANY operator allows you to compare a single value with a list of values, selecting only the rows whose qoh*price is greater than any value of the list. FROM subqueries FROM clause specifies the table from which data will be drawn. Example:To find all customers who purchased both products ‘PEN’ and ‘PENCIL’ SELECT DISTINCT cno, cname FROM customer, ( SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PEN’) cp1, (SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PENCIL’) cp2 WHERE customer.cno = cp1.cno AND cp1.cno=cp2.cno; (OR) CREATE VIEW cp1 AS SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PEN’; CREATE VIEW cp2 AS SELECT invoice.cno FROM invoice NATURAL JOIN line WHERE pdesc=’PENCIL’; SELECT DISTINCT cno, cname FROM customer NATURAL JOIN cp1 NATURAL JOIN cp2; Attribute List Subqueries or inline subquery. The attribute list can also include a subquery expression also known as inline subquery. The inline subquery must return one single value otherwise an error code is raised. SELECT pno, price ,(SELECT AVG(price) FROM product) AS AVGPRICE , price - (SELECT AVG(price) FROM product) AS DIFF FROM product; The query used the full expression instead of column aliases when computing DIFF. The column aliases cannot be used in computations in the attribute list when the alias is defined in the same attribute list. We can use Attribute List Subqueries to include data from other tables that are not directly related to main table or tables in the query. SELECT pno,SUM(line_units*line_price) AS sales, (SELECT COUNT(*) FROM employee) AS ecount, SUM(line_units*line_price)/ (SELECT COUNT(*) FROM employee) AS contib FROM line GROUP BY pno; CORRELATED SUBQUERIES To produce correlated subquery the DBMS does i. It iniates the outer query ii. For each row of the outer query result set, it executes the inner query by passing the outer row to the inner query.( inner query references a column of the outer subquery) Example:To find all product sales who units sold > average units sold for that product. SELECT invno, pno, line_units FROM line LS WHERE LS.line_units>(SELECT AVG(line_units) FROM line LA WHERE LA.pno = LS.pno); Page 38
  • 39.
    The inner queryruns once using the first pno found in the outer line table and returns average sale for that product. CORRELATED subqueries can also be used with EXISTS special operator Example: To know the vendor code and name for the products having qoh<10 SELECT vno, vname FROM vendor WHERE EXISTS(SELECT * FROM product WHERE qoh<10 AND vendor.vno=product.vno); SQL functions: Used to generate information from data. DUAL: is Oracle pseudo table used only for cases when a table is not really needed. DATE/TIME FUNCTIONS Function Examples LAST_DAY: returns the last day of the month based on a date value. Syntax: last_day( date_value ) SELECT last_day(to_date('2003/03/15', 'yyyy/mm/dd')) FROM DUAL; would return Mar 31, 2003 SELECT last_day(to_date('2003/02/03', 'yyyy/mm/dd')) FROM DUAL; would return Feb 28, 2003 List employees born in the last seven days of a month SELECT * FROM emp WHERE dob >= LAST_DAY(dob)-7; TO_CHAR function : converts a number or date to a string. Syntax: TO_CHAR(date_value,fmt) fmt = format used can be MONTH Name of month MON:three-letter month name MM-two digit month name D Day of week (1-7). DAY Name of day. DD Day of month (1-31). YYYY 4-digit year YY: two digit year value SELECT to_char(sysdate, 'yyyy/mm/dd') FROM DUAL; would return '2003/07/09' List all employees born in 1994. SELECT * FROM emp WHERE TO_CHAR(dob,’YYYY’)=’1994’; List all employees born in the month of NOVEMBER SELECT * FROM emp WHERE TO_CHAR(dob,’MM’)=’11’; List all employees born on 14th of a month SELECT * FROM emp WHERE TO_CHAR(dob,’DD’)=’14’; TO_DATE function:converts a string to a date. Also used to translate a date between formats. Syntax: TO_DATE(char_value,fmt) fmt = format used can be as above SELECT to_date('2003/07/09', 'yyyy/mm/dd') FROM DUAL; would return a date value of July 9, 2003. Find the age of employess as on 12-31-2012 SELECT e_lname,TO_DATE(’12/31/2012’,’MM/DD/YYYY’)- dob/365 AS YEARS FROM emp; NOTE: ‘12/31/2012’ is a text string, not a date, TO_DATE translates the text string to a valid oracle date used in date arithmetic. How many days are between 6/25/2011 and 10/27/2011 SELECT TO_DATE(’2011/06/25’,’YYYY/MM/DD’)- TO_DATE(‘OCTOBER 27,2011’,’MONTH Page 39
  • 40.
    ,DD,YYYY’) FROM DUAL; SYSDATE: returns todays date SELECT TO_DATE(’25-DEC-2011’,’DD-MON- YYYY’)-SYSDATE FROM dual; ADD_MONTHS: adds months to a date. Syntax: add_months( date_value, n ) date_value is the starting date (before the n months have been added). n is the number of months to add to date1. SELECT add_months('01-Aug-03', 3) FROM DUAL; would return '01-Nov-03' SELECT pno,p_indate,ADD_MONTHS(p_indate,24) FROM product NUMERIC FUNCTIONS Aggregate functions operate over a set of values(multiple rows) while numeric functions operate over a single row. Function Example ABS Returns absolute value of a number. Syntax: ABS(numeric_value) SELECT ABS(1.95),ABS(-1.93) FROM DUAL; Would return 1.95 1.93 ROUND function returns a number rounded to a certain number of decimal places. Syntax:ROUND(numeric_value,p) p=precision SELECT round(125.315) FROM DUAL; would return 125 SELECT ROUND(sal) as sal1, ROUND(sal) as sal2 FROM emp; CEIL function returns the smallest integer value that is greater than or equal to a number. Syntax: ceil( number ) SELECT ceil(-32.65) FROM DUAL; would return -32. SELECT ceil(32.65) FROM DUAL; would return 33. SELECT CEIL(sal) ,FLOOR(sal) FROM emp; FLOOR function returns the largest integer value that is equal to or less than a number. Syntax: floor( number ) SELECT floor(5.9) FROM DUAL; would return 5 SELECT floor(-5.9) FROM DUAL; would return -6 The sqrt function returns the square root of n. Synatx: sqrt( n ) n is a positive number. sqrt(9) would return 3 mod function returns the remainder of m divided by n mod(15, 4) would return 3 power function returns m raised to the nth power. Syntax : power( m, n ) m is the base. n is the exponent. If m is negative, then n must be an integer. power(3, 2) would return 9 Page 40
  • 41.
    exp function returnse raised to the nth power, where e = 2.71828183. exp(3) would return 20.0855369231877 trunc function returns a number truncated to a certain number of decimal places. trunc(125.815, 0) would return 125 trunc(125.815, 1) would return 125.8 ln function returns the natural logarithm of a number. ln(20) would return 2.99573227355399 log function returns the logarithm of n base m. Syntax: log( m, n ) m must be a positive number, except 0 or 1. n must be a positive number. log(100, 1) would return 0 String Functions: are useful to concatenate strings of characters, printing names in upper case or knowing the length of a given attribute. Function Example UPPER function converts all letters in the specified string to uppercase. Syntax: UPPER(string) upper('Tech on'); would return 'TECH ON List all employee names in upper case. SELECT UPPER (e_initial) || ‘.’|| UPPER (e_fname) || UPPER(e_lname) FROM EMP; LOWER function converts all letters in the specified string to lowercase. Syntax: LOWER(string) List all employee names in lower case. SELECT LOWER (e_initial) || ‘.’|| LOWER (e_fname) || LOWER(e_lname) FROM EMP; SUBSTR function allows you to extract a substring from a string. Syntax:substr( string, p, l ) string is the source string. p is the position for extraction. l is optional. It is the number of characters to extract. substr('This is a test', 6, 2) would return 'is' substr('This is a test', 6) would return 'is a test' substr('This is a test', -3, 3) would return 'Net' List first 3 characters of all employee last names.. Ex:SELECT SUBSTR(e_lname,1,3) AS prefix FROM EMP; LENGTH function returns the number of characters in the specified string. Syntax: length( string) length(NULL) would return NULL. length('') would return NULL. length('Tech on the Net') would return 15. List all employees last names and length of their last names. SELECT e_lname, LENGTH(e_lname) FROM EMP; Concatenation The || operator allows you to concatenate data from two different character columns and returns a single column. Syntax: string1 || string2 'a' || 'b' || 'c' || 'd' would return 'abcd'. List all employee names (concatenated) SELECT e_initial || ‘.’|| e_fname || e_lname AS NAME FROM EMP; CONVERSION FUNCTIONS:allows you take a value of given data type and convert it to the equivalent value in another data type. Functions Example TO_CHAR : returns a character string from a numeric value. Syntax: SELECT eno, TO_CHAR(sal, ‘9,999.99’) AS PRICE FROM EMP; Page 41
  • 42.
    TO_CHAR(numeric_value, fmt) TO_NUMBER : returnsa formatted number from a character string. Syntxa:TO_NUMBER (char_value, fmt) fmt= format used can be: 9 - displays a digit 0 – displays a leading zero , - displays the comma . – displays the decimal point $ - displays the dollar sign B – leading blank S – leading sign MI – trailing minus sign SELECT TO_NUMBER(‘-123.99’,’S9999.99’), TO_NUMBER(’99.78-’,’B999.99MI’), FROM DUAL; DECODE: compares an attribute or expression with a series of values and returns an associated value or a default value if no match is found Syntax: DECODE(e,x,y,d) e – attribute or expression x – value with which to compare e y – value to return in e = x d – default value to return if e is not equal to x The following example returns the sales tax for specified cities. Compares vcity to ‘VJA’ ;if the value matches it returns .08 Compares vcity to ‘GNT’ ;if the value matches it returns .05 If there is no match it returns 0.00( the default value) SELECT vno, vcity, DECODE(vcity,’VJA’,.08,’GNT’,.05,0.00) AS TAX FROM VENDOR; Page 42
  • 43.
    Oracle sequences: generatesa numeric value that can be assigned to any column in any table. Use of sequences is optional, you can enter the values manually. Oracle sequences have a name and can be used any where a value is expected. Sequences can be created and deleted anytime. The table attribute to which you assigned a value based on a sequence can be edited and modified. Oracle sequences are • Independent objects in the database. • Not a data type • Not tied to a table or column Syntax: CREATE SEQUENCE name [START WITH n] [INCREMENT BY n] [CACHE | NOCACHE] where name is the name of the sequence n is an integer that can be positive or negative. START WITH specifies initial sequence value( the default value is 1) INCREMENT BY determines the value by which the sequence is incremented. The CACHE or NOCACHE indicates whether oracle will preallocate sequence numbers in memory. (Oracle preallocates 20 values by default) Example: CREATE SEQUENCE CSEQ1 START WITH 204 INCREMENT BY 1 NOCACHE; To check all the sequences you have created. SELECT * FROM USER_SEQUENCES; To use sequences during data entry you must use two special pseudo columns NEXTVAL and CURRVAL. NEXTVAL retrieves the next available value from a sequence. Each time you use NEXTVAL , the sequence is incremented. CURRVAL retrieves the current value of sequence. Example INSERT INTO CUSTOMER VALUES (CSEQ1.NEXTVAL,’RAVI’,’NELLORE’, 500); INSERT INTO INVOICE VALUES (‘I05’ , CSEQ1.CURRVAL,’22-AUG-2011’); You cannot use CURRVAL unless a NEXTVAL was issued previously in the same session. NEXTVAL retrieves the next available sequence number( here 204) and signs to cno in CUSTOMER table. CSEQ1.CURRVAL refers to last used CSEQ1.NEXTVAL sequence number(204). In this way the relationship between INVOICE and CUSTOMER is established. COMMIT; statement must be issued to make the changes permanent. You can also issue a ROLLBACk statement , in which case the rows you inserted in INVOICE and CUSTOMER will be rolled back.( but sequence number would not) That is, if you use sequence number again you must get 204 but you will get 205 eventhough the row 204 is deleted. DROPPING a SEQUENCE doesnot delete the values you assigned to table attributes. Syntax: DROP SEQUENCE CSEQ1; Page 43
  • 44.
    VIEWS A view isa virtual table based on SELECT query. The tables on which view is based are called base tables. Syntax: CREATE VIEW viewname AS SELECT query Characteristics: A relational view has several special characteristics • We can use the view instead of table in a SQL statement. • Views are dynamically updated when the base table is updated. • Views provide a level of security in the database. The view can restrict users to only specified columns and specified rows in a table. • View may also be used as the basis for reports Example: CREATE VIEW PROD_STATS AS SELECT vno, SUM(qoh * price) AS TotalCost FROM PRODUCT GROUP BY vno; To drop a view Syntax: DROP VIEW <view name> Example: DROP VIEW PROD_STATS UPDATABLE VIEWS:To use batch update routines to update master table attribute with transaction data. To demonstrate a batch update routine, consider two tables NOTE:There is 1:1 relationship between two tables To Update qoh attribute (qoh – qty as that much quantity has been sold) 1. We have to join two tables 2. update qoh for each row of ProdMaster table with matching pno values in ProdSales table. We use a updatable view to do that. Updatable view is a view that can be used to update attributes in the base tables that are used in the view. Not all views are updatable. The most common updatable view restrictions are as follows: 1. GROUP BY and aggregate functions cannot be used. 2. Cannot use SET operators. 3. The P.K columns of base table you want to update must have unique values in the view. That is, the two tables must have 1:1 relationship then only the view can be used to update a base table. Example: CREATE VIEW QUP AS ( SELECT ProdMaster.pno, qoh, qty FROM ProdMaster, ProdSales WHERE ProdMaster.pno=ProdSales.pno); Page 44 ProdMaster pno pdesc qoh P01 SCREWS 60 P02 NUTS 37 P03 BOLTS 50 ProdSales pno qty P01 7 P02 3
  • 45.
    UPDATE QUP SETqoh=qoh-qty; Page 45
  • 46.
    Q: What isPSM (Persistent Stored Module)? Ans: A Persistent Stored Module is a block of code containing standard SQL statements and procedural extensions that is stored and executed at the DBMS server. The PSM represents business logic that can be encapsulated, stored and shared among multiple database users. A PSM lets an administrator assign specific access rights to a stored module to ensure that only authorized users can use it. Oracle implements PSMs through its procedural SQL language. (PL/SQL) Q: What is PL/SQL? Explain? Ans: PL/SQL is a language that makes it possible to use and store procedural code and SQL statements within the database. It is also used to merge SQL and traditional programming constructs, such as • Variables, • conditional processing (IF-THEN-ELSE), • basic loops (FOR and WHILE loops) and • Error trapping. The procedural code is executed as a unit by the DBMS when it is invoked by the end user. End users can use PL/SQL to create • Anonymous PL/SQL blocks. • Triggers • Stored Procedures • PL/SQL functions You can write PL/SQL code block by enclosing the commands inside BEGIN and END clause. Ex: BEGIN INSERT INTO vendor VALUES (105, ‘SITA’, ‘TNL’); END; / This is an example of anonymous PL/SQL block because it has not given a specific name. The above PL/SQL block executes as soon as you press ENTER key after typing / You will see the message “PL/SQL procedure successfully completed” If you want a more specific message such as “new vendor added”. You must type as follows: SQL> SET SERVEROUTPUT ON This SQL * plus command enables the client console (SQL * plus) to receive messages from the server side(ORACLE DBMS).To send messages from the PL/SQL block to SQL * plus console, use the DBMS_OUTPUT.PUT_LINE function. The standard SQL , the PL/SQL code are executed at server side, not at client side.To stop receiving messages from sever , enter SET SERVEROUTPUT OFF. In oracle , you can use the SQL * plus command SHOW ERRORS to help you diagnose errors found in PL/SQL blocks. Q: Write anonymous PL/SQL program to insert rows into VENDOR table and display the message “New vendor added”. Ans: BEGIN INSERT INTO vendor VALUES (106,’GITA’,’VJA’); DBMS_OUTPUT.PUT_LINE(‘New vendor added’); END; / Page 46
  • 47.
    PL/SQL Basic datatypes Data Type Description CHAR character values of a fixed length VARCHAR2 variable length character values NUMBER numeric values DATE Date values %TYPE inherits the datatype from a variable that you declared previously or from an attribute of a database table. Ex: price1 PRODUCT.price %TYPE ; assigns price1 the same datatype as the price column in the PRODUCT table. Q: Write anonymous PL/SQL program to display the number of products in price range 0 and 10, 11 and 60 ,61 and 110 etc.. Ans: DECLARE P1 NUMBER(3) := 0; P2 NUMBER(3) := 10; NUM NUMBER(2) := 0; BEGIN WHILE P2<5000 LOOP SELECT COUNT(pno) INTO NUM FROM product WHERE price BETWEEN P1 AND P2; DBMS_OUTPUT.PUT_LINE(‘There are ‘|| NUM|| ‘ products with price between ‘||P1|| ‘ and ‘||P2); P1 := P2+1; P2 := P2+50; END LOOP; END; / The PL/SQL block shown above has following characteristics. 1. Each statement inside the PL/SQL code must end with a semicolon 2. The PL/SQL block starts with the DECLARE section in which you declare the variable names, the data types and an initial value(optional). 3. A WHILE loop is used. 4. Uses the string concatenation symbol. 5. SELECT statement uses the INTO keyword to assign output of the query to a PL/SQL variable The most useful feature of PL/SQL block is that they let you create code that can be named, stored and executed either implicitly or explicitly by the DBMS. What is Trigger ? Explain. Ans: A trigger is a procedural sql code which is fired when a DML statements like Insert, Delete, Update is executed on a database table. The syntax to create a trigger in oracle is: CREATE OR REPLACE TRIGGER trigger_name [BEFORE / AFTER] [DELETE /INSERT/UPDATE OF column_name ] ON table_name [FOR EACH ROW] [DECLARE] [variable_name data-type [:= initial_value]] BEGIN PL/SQL instructions; …… Page 47
  • 48.
    END; A trigger definitioncontains the following parts: 1.The triggering timing: BEFORE or AFTER. This timing indicates at what time the trigger should get fired. (before or after the triggering statement is completed.) 2.The triggering statement/event: The statement that causes the trigger to execute (INSERT, UPDATE or DELETE) The triggering level: There are two types of triggers: statement – level triggers and row – level triggers • Statement – level triggers: This type of trigger is executed once, before or after the triggering statement is completed. • Row – level triggers: requires the use of the FOR EACH ROW keywords. This type of trigger is executed once for each row affected. ( if you update 10 rows, the trigger executes 10 times. 2. Triggering Action: The PL/SQL code enclosed between BEGIN and END keywords. You can use a trigger to update an attribute in a table other than the one being modified. CREATE OR REPLACE TRIGGER TLP AFTER INSERT ON line FOR EACH ROW BEGIN UPDATE product SET qoh = qoh - :NEW.LINE_UNITS WHERE product.pno = :NEW.pno; END; / TLP is a row level trigger that executes after inserting a new LINE row and reduces quantity on hand (in PRODUCT table) of recently sold product by the number of units sold. CREATE OR REPLACE TRIGGER trigger_name =>creates a trigger with the given name or overwrites an existing trigger with the same name. OF column_name =>This clause is used with update triggers. This clause is used when you want to trigger an event only when a specific column is updated ON table_name=> the name of the table or view to which the trigger is associated. Example of a statement level trigger that is executed after an update of the qoh, pmin attribute for an existing row or after an insert of a new row in the product table. CREATE or REPLACE TRIGGER TPR AFTER INSERT OR UPDATE OF QOH,PMIN ON PRODUCT BEGIN UPDATE PRODUCT SET REORDER =1 WHERE QOH <= PMIN; END; / Q: When does a trigger fire? Ans: A trigger is triggered automatically when an associated DML statement is executed. • A trigger is invoked before or after a data row is inserted, updated or deleted. • A trigger is associated with a database table. • Each database table may have one or more triggers. • A trigger is executed as part of the transaction that triggered it. Page 48
  • 49.
    Q: How todelete a trigger? Ans: When you delete a table, all its trigger objects are deleted with it. If you want to delete a trigger without deleting the table, give the following command DROP TRIGGER triggername. Q: Write a program to update the customer balance in the CUSTOMER table after inserting every new LINE row. CREATE OR REPLACE TRIGGER TLC AFTER INSERT ON line FOR EACH ROW DECLARE cus CHAR(5); tot NUMBER := 0; --to compute total cost BEGIN SELECT cno INTO cus FROM invoice --1) get the customer code WHERE invoice.invno = :NEW.line_units; tot := :NEW.line_price * :NEW.invno; --2)compute the total of the current line UPDATE customer SET baldue = baldue + tot WHERE cno = cus; DBMS_OUTPUT.PUT_LINE(‘ *** Balance updated for customer : ‘ || cus); END; / The trigger is a row level trigger that executes for each new LINE row inserted. The SELECT statement returns only one attribute (cno) from INVOICE table and that attribute returns only one value. You use the INTO clause to assign a value from a SELECT statement to a variable (cus) used within a trigger. Double dashes “--“ are used to indicate comments within the PL/SQL block. Trigger action based on conditional DML predicates You can create a trigger that executes after an insert, an update or a delete on the PRODUCT table and to know which one of the three statements caused the trigger to execute use the following syntax: IF INSERTING THEN ……END IF; IF UPDATING THEN ……END IF; IF DELETING THEN……END IF; Triggers can be used to • To enforce constraints that cannot be enforced at the DBMS design and implementation levels. • To facilitate enforcement of referential integrity. • Update table values, insert records in tables and call other stored procedures. • Triggers add functionality by automating critical actions and providing appropriate warnings and suggestions. • Triggers add processing power to RDBMS and to database system as a whole. Oracle recommends triggers for • Auditing purposes (creating audit logs) • Automating generation of derived column values. • Enforcement of business or security constraints. Page 49
  • 50.
    • Creation ofreplica tables for back up purposes. Q:What are the various type of triggers? Statement – level triggers: This type of trigger is executed once, before or after the triggering statement is completed. Example of a statement level trigger that is executed after an update of the qoh, pmin attribute for an existing row or after an insert of a new row in the product table. CREATE or REPLACE TRIGGER TPR AFTER INSERT OR UPDATE OF QOH,PMIN ON PRODUCT BEGIN UPDATE PRODUCT SET REORDER =1 WHERE QOH <= PMIN; END; / Row – level triggers: requires the use of the FOR EACH ROW keywords. This type of trigger is executed once for each row affected. ( if you update 10 rows, the trigger executes 10 times. Example: CREATE or REPLACE TRIGGER TPR BEFORE INSERT OR UPDATE OF QOH, PMIN ON PRODUCT FOR EACH ROW BEGIN IF :NEW.QOH <= :NEW.PMIN THEN :NEW.REORDER :=1; ELSE :NEW.REORDER :=0; END IF; END; / What are Stored Procedures? Explain? A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Advantages: • Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. • Stored procedures reduce network traffic and improve performance. • Stored procedures can be used to help ensure the integrity of the database. • Stored procedures help reduce code duplication by means of code isolation and code sharing, there by minimizing the chance of errors and the cost of application development and maintenance. • Stored procedures are useful to encapsulate shared code to represent business transactions i.e, you need not know the name of newly added attribute and would need to add new parameter to the procedure call. Syntax to create procedure: CREATE OR REPLACE PROCEDURE procedure_name [(argument [in/out] data-type,….)] [IS / AS] [variable_name data-type [:= initial_value]] BEGIN PL/SQL or SQL statements; … END; Page 50
  • 51.
    Syntax to executea stored procedure EXEC procedure_name[(parameter_list)]; Ex: Write a stored procedure to assign an additional 5 % discount for all products when the QOH = 2PMIN CREATE OR REPLACE PROCEDURE prod_discount AS BEGIN UPDATE product SET discount = discount + .05 WHERE qoh >= pmin * 2; DBMS_OUTPUT.PUT_LINE(‘*** Update Finished ***’); END; / 1. argument specifies the parameters that are passed to the stored procedures. A stored procedure could have zero or more arguments. 2. IN/OUT indicates whether the parameter is for input, output or both. 3. Variables can be declared between the keywords IS and BEGIN. To make percentage increase an input variable in the above procedure--- CREATE OR REPLACE PROCEDURE prod_discount ( pd IN NUMBER) AS BEGIN IF ((pd <= 0) OR (pd >= 1)) THEN DBMS_OUTPUT.PUT_LINE(‘Error value must be greater than 0 and less than 1’); ELSE UPDATE product SET discount = discount + .05 WHERE qoh >= pmin * 2; DBMS_OUTPUT.PUT_LINE(‘*** Update Finished ***’); END IF; END; / To execute the above procedure--- EXEC prod_discount(.05); Q: write a stored procedure to add new customer. CREATE OR REPLACE PROCEDURE cadd (w_cname IN VARCHAR2, w_city IN VARCHAR2) AS BEGIN INSERT INTO customer (cno, cname, city) values(CSEQ1.NEXTVAL, w_cname, w_city); DBMS_OUTPUT.PUT_LINE(‘Customer added ’); END; / The procedure uses • several parameters one for each required attribute in the CUSTOMER table. • CSEQ1 sequence to generate a new customer code. The parameters can be null only when the table specifications permit null for that parameter. To execute: EXEC cadd(‘KALA’, ‘VJA’,NULL); Page 51
  • 52.
    Q: Write proceduresto add new invoice and line row. Ans: CREATE OR REPLACE PROCEDURE invadd(w_cno IN NUMBER, w_date IN DATE) AS BEGIN INSERT INTO invoice VALUES(ISEQ.NEXTVAL, w_cno, w_date); DBMS_OUTPUT.PUT_LINE(‘Invoice Added’); END; / CREATE OR REPLACE PROCEDURE lineadd (ln IN CHAR, pn IN CHAR, lu IN NUMBER) AS lp NUMBER := 0; BEGIN SELECT price INTO lp FROM product WHERE pno = pn ; INSERT INTO line VALUES(ISEQ. CURRVAL, ln, pn, lu, lp); DBMS_OUTPUT.PUT_LINE(‘Invoice Line Added’); END; / Q: What is a cursor? How many types of cursors are there? How to handle cursors? Ans:Cursor is reserved area in memory in which output of the query is stored, like an array holding rows and columns. There are two types of cursors: implicit and explicit. An implicit cursor is automatically created in PL/SQL when the SQL statement returns only one value. An explicit cursor is created to hold the output of an SQL statement that may return two or more rows.(but could return 0 or only one row) To create an explicit cursor, use the following syntax inside PL/SQL DECLARE section. CURSOR cursor_name IS select-query; The cursor declaration section only reserves a named memory area for the cursor. Once you declared a cursor, you can use cursor processing commands anywhere between the BEGIN and END keywords of the PL/SQL block. Cursor Processing Commands Cursor Command Explanation OPEN Executes the SQL command and populates the cursor with data Before you can use a cursor, you need to open it Ex: OPEN cursor_name. FETCH To retrieve data from the cursor and copy it to the PL/SQL variables. The syntax is : FETCH cursor_name INTO variable1 [,variable2, …..] CLOSE The CLOSE command closes the cursor for processing Cursor style processing involves retrieving data from the cursor one row at a time. The set of rows the cursor holds is called the active set. The data set contains a current row pointer. Therefore after opening a cursor, the current row is the first row of the cursor. Page 52
  • 53.
    When you fetcha row from the cursor, the data from the current row in the cursor is copied to the pl/sql variables. After the fetch, the current row pointer moves to the next row in the set and continues until it reaches the end of the cursor. Cursor Attributes determine when you reached the end of the cursor data set, number of rows in cursor etc… Attribute Description %ROWCOUNT Returns the number of rows fetched so far. If the cursor is not OPEN, it returns an ERROR. If no fetch has been done but the cursor is OPEN, it returns 0. %FOUND Returns TRUE if the last FETCH returned a row and FALSE if not. If the cursor is not OPEN, it returns an ERROR. If no fetch has been done, it contains NULL. %NOTFOUND Returns TRUE if the last FETCH did not return any row and FALSE if it did. If the cursor is not OPEN, it returns an ERROR. If no fetch has been done, it contains NULL. %ISOPEN Returns TRUE if the cursor is OPEN or FALSE if the cursor is CLOSED. CREATE OR REPLACE PROCEDURE pce IS p product.pno%TYPE; desc product.pdesc%TYPE; tot NUMBER(3); CURSOR pc IS SELECT pno, pdesc FROM product WHERE qoh > (SELECT AVG(qoh) FROM product); BEGIN DBMS_OUTPUT.PUT_LINE(‘PRODUCTS WITH QOH > AVG(QOH)’); OPEN pc; LOOP FETCH pc INTO p,desc; EXIT WHEN pc%NOTFOUND; DBMS_OUTPUT.PUT_LINE(p||’ => ‘||desc); END LOOP; DBMS_OUTPUT.PUT_LINE(‘TOTAL PRODUCT PROCESSED ‘||pc%ROWCOUNT); CLOSE pc; END; / Page 53
  • 54.
    Unit-III Chapter –IIIDatabase Design Q: What is an information system? Ans: A complete information system is composed of people, hardware, software, the databases, application programs and procedures. The process of creating an information system is known as system development. Q: The system development life cycle(SDLC) The SDLC is an iterative rather than a sequential process. 1. Planning: The SDLC planning phase yields a general overview of the company and its objectives. An initial assessment of the information flow-and-extent requirements must be made to answer questions like Should the existing system be continued? Should the existing system be modified? Should the existing system be replaced? If it is decided that a new system is necessary, then it is checked whether the new system is feasible or not. The feasibility study includes 1. Technical feasibility: Can the development of the new system be done with current equipment, existing software technology, and available personnel? Does it require new technology? 2. Economic feasibility: Can we afford it? Is it a million dollar solution for a thousand dollar problem? 3. Operational feasibility: Does the company possess the human, technical and financial resources to keep the system operational? Will there be resistance from users? 2.Analysis: A thorough audit of user requirements and understanding of system’s functional areas, actual and potential problems and opportunities. The logical design must specify the appropriate conceptual data model, inputs, processes and expected output requirements using tools such as DFDs,ER diagrams etc.. All data transformations (processes) are described and documented using such system analysis tools. 3.Detailed System design The design includes all the necessary technical specifications for the screens, menus, reports and other devices that might be used to help make the system more efficient information generator. 4. Implementation The hardware, DBMS software and application programs are installed and the database design is implemented. During the intial stages of implementation phase, the system enters into a cycle of coding, testing and debugging until it is ready to be delivered. The system will be in full operation by the end of this phase but will be continuously evaluated and fine-tuned. 5. Maintenance Maintenance includes all the activity after the installation of software that is performed to keep the system operational. Major forms of maintenance activities are fixing of errors fall under corrective maintenance. Adaptive maintenance due to changes in the business environment. Perfective maintenance to enhance the system. Page 54
  • 55.
  • 56.
  • 57.
  • 58.
    UNIT-IV Chapter-I Transaction managementand concurrency control What is a transaction? A transaction is any action that reads from and / or writes to a database. A transaction is a single, indivisible, logical unit of work. All transaction are controlled and executed by the DBMS to guarantee database integrity. Q: Transaction properties or ACID test Each individual transaction must display Atomicity, Consistency, Isolation and Durability. Atomicity: requires all operations (SQL requests) of a transaction be completed if not the transaction is aborted. If a transaction T1 has four SQL requests, all four requests must be successfully completed otherwise the entire transaction is aborted. Consistency: A transaction takes a database from one consistent state to another. Isolation: means that the data used during the execution of a transaction cannot be used by a second transaction until the first one is completed. If transaction T1 is being executed and is using data item X, that data item cannot be accessed by any other transction until T1 ends. Durability:ensures that once transaction changes are done(committed), they cannot be undone or lost, even in the event of a system failure. COMMITED TRANSACTIONS ARE NOT ROLLED BACK When executing multiple transactions the DBMS must schedule the concurrent execution of the transaction’s operations. The schedule of such transaction’s operations must exhibit the property of serializability. Serializability ensures that the schedule for the concurrent execution of the transactions yields consistent results. Q: Transaction management with SQL Transaction support is provided by two SQL statements : COMMIT and ROLLBACK. Transaction sequence must continue through all succeeding SQL statements until one of the following four events occurs. • A COMMIT statement is reached, in which case all changes are permanently recorded within the database. The COMMIT statement automatically ends the SQL transaction. Ex: UPDATE product SET qoh = qoh-2 WHERE pno=’P01’; UPDATE customer SET baldue = baldue+20 where cno = 201; COMMIT; • A ROLLBACK is reached, in which case all changes are aborted and the database is rolled back to its previous consistent state. • The end of a program is successfully reached, in which case all changes are permanently recorded within the database. This action is equivalent to COMMIT • A program is abnormally terminated, in which case changes made in the database are aborted and the database is rolled back to its previous consistent state. This action is equivalent to ROLLBACK. A transaction begins implicitly when the first SQL statement is encountered. SQL SERVER uses transaction management statement such as BEGIN TRANSACTION; to indicate beginning of a new transaction. Page 58
  • 59.
    The Oracle RDBMSuses the SET TRANSACTION statement to declare a new transaction start and its properties. Q: The Transaction Log: A DBMS uses a transactional log to keep track of all transactions that update the database. The information stored in this log is used by the DBMS for a recovery requirement The transaction log stores: • A record for the beginning of the transaction. • For each transaction component (SQL statement ): The type of operation being performed (update, insert, delete) The names of the objects affected by the transaction ( the name of the table) The before and after values for the fields being updated. Pointer to the previous and next transaction log entries for the same transaction. • The ending (COMMIT) of the transaction. A transaction log TRL_ID TRX_NUM Prev PTR Next PTR Operation Table RowID Attribute Befor e Value After Value 341 101 Null 352 START **Start Transaction 352 101 341 363 UPDATE PRODUCT P01 qoh 20 18 363 101 352 365 UPDATE CUSTOMER 201 baldue 100 120 365 101 363 Null COMMIT **End of Transaction Concurrency Control The Coordination of the simultaneous execution of transactions in a multi user database system is known as concurrency control. Problems due to concurrent transaction due to lack of concurrency control are: 1. Lost updates 2. Uncommited data and 3. Inconsistent retrievals Lost updates occurs when two concurrent transactions, T1 and T2 are updating the same data element and one of the upadates is lost (overwritten by the other transaction) Ex: consider two concurrent transactions to update qoh Transaction Computation T1 :Purchase of 100 units qoh = qoh + 100 T2: Sell 30 units Qoh = qoh - 30 Serial execution of above transactions Time Transaction Step Value 1 T1 Read qoh 35 2 T1 qoh = 35+ 100 3 T1 Write qoh 135 4 T2 Read qoh 135 5 T2 qoh = 135-30 6 T2 Write qoh 105 Page 59
  • 60.
    The below tableshows how a Lost Updates problem can arise Time Transaction Step Value 1 T1 Read qoh 35 2 T2 Read qoh 35 3 T1 qoh = 35+ 100 4 T2 qoh = 35-30 5 T1 Write qoh (lost update) 135 6 T2 Write qoh 5 Uncommited data or dirty read occurs when two transactions T1 and T2 are executed concurrently and the first transaction (T1) is rolled back after the first transaction (T1) is rolled back after the second transaction (T2) has already accessed the uncommitted data - thus violating the isolation property of transactions. Time Transaction Step Value 1 T1 Read qoh 35 2 T1 qoh = 35+ 100 3 T1 Write qoh 135 4 T1 ***ROLL BACK** 35 5 T2 Read qoh 35 6 T2 qoh = 135-30 7 T2 Write qoh 5 Time Transaction Step Value 1 T1 Read qoh 35 2 T1 qoh = 35+ 100 3 T1 Write qoh 135 4 T2 Read qoh 135 5 T2 qoh = 135-30 6 T1 ***ROLLBACK*** 35 7 T2 Write qoh 105 Inconsistent retrievals occur when a transaction accesses data before and after another transaction finish working with such data. For ex: T1 calculates summary( aggregate) function over a set of data while another transaction T2 is updating same data Transaction T1 Transaction T2 SELECT SUM(qoh) FROM product WHERE pno < ‘P04’; UPDATE product SET qoh = qoh + 10 WHERE pno = ‘P01’; Time Transaction Action Value Total 1 T1 Read qoh for pno = ‘P01’ 10 10 2 T2 Read qoh for pno = ‘P01’ 10 3 T2 qoh = 10 + 10 4 T2 Write qoh for pno = ‘P01’ 20 4 T1 Read qoh for pno = ‘P02’ 3 13 5 T2 ***COMMIT*** 6 T2 Read qoh for pno = ‘P03’ 5000 5013 The computed answer of 5013 is wrong as the correct answer is 5023 Page 60
  • 61.
    Unless the DBMSexercises concurrency control, a multi user database environment can create havoc within the information system. The Scheduler If two transactions access unrelated data then There will be no conflict And order of execution is irrelevant. If the transactions operate on related data then Conflict possible and The selection of one execution order over another might have undesirable consequences. The correct order is determined by built - in scheduler. Q: What is a scheduler? The scheduler is a special DBMS process that establishes the order in which the operations within concurrent transactions are executed. The scheduler bases its actions on concurrency control algorithms, such as locking or time stamping methods. NOT ALL TRANSACTIONS ARE SERIALIZALE If the transactions are not serializable then the transaction are executed on first-come, first - served basis by the DBMS. If transactions are executed in serial order one after another then CPU time will be wasted and Yields unacceptable response times within the multi user DBMS environment. Q: What is Serializable Schedule? Ans: A serializable schedule is a schedule of transaction’s operations in which the interleaved execution of the transactions (T1, T2, T3 etc.) yields the same result as if the transaction were executed in serial order (one after another). Q: What are conflicting database operations? Ans: Scheduler facilitates data isolation to ensure two transactions do not update the same data element at the same time. Two operations are in conflict when they access the same data and atleast one of them is a WRITE operation. The figure shows Conflicting database operations Concurrency control with locking methods A lock guarantees exclusive use of a data item to a current transaction. A transaction acquires lock prior to data access; the lock is released when the transaction is completed so that another transaction can lock the data item for its exclusive use. The database might be in a temporary inconsistent state when several updates are executed. Therefore, locks are required to prevent another transaction from reading inconsistent data. Most multi user DBMSs automatically initiate and enforce locking procedures. All lock information is managed by a lock manager. Q: What is lock granularity? Explain? Page 61 Transaction Operations T1 T2 Result Read Read No Conflict Read Write Conflict Write Read Conflict Write Write Conflict
  • 62.
    Ans: Lock granularityindicates the level of lock use. Locking can take place at the following levels: database, table, page, row or even field. Database level In a database level lock, the entire database is locked. This level of locking is good for batch processes, but it is unsuitable for multiuser DBMSs. Table Level In a table level lock, the entire table is locked. It is unsuitable for multiuser DBMSs. Page Level In page level lock, the DBMS will lock an entire diskpage. Page is equivalent to disk block. A block is the smallest unit of data transfer between the hard disk and the processor. Row level The DBMS allows concurrent transactions to access different rows of the same table. The row - level locking approach Improves the availability of data but Its management requires high overhead because a lock exists for each row. Modern DBMS automatically escalate a lock from row level to page level lock when the application session requests multiple locks on the same page. Field Level The DBMS allows concurrent transactions to access different fields within a row. Field level locking Yields the most flexible multi user data access but it is Rarely implemented in DBMS LOCK TYPES : DBMS may use different lock types: Binary locks and Shared/Exclusive locks Binary Locks: A binary lock has only two states: locked (1) or unlocked (0). If an object is locked by transaction, no other transaction can use that object. Shared/Exclusive Locks :A shared lock is issued when a transaction wants to read data from the database. An exclusive lock is issued when a transaction want to update (write) a data item. Using Shared/Exclusive Locks concept, a lock can have three states: unlocked, shared (read) and exclusive (write). Lock-compatibility matrix • Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item. • If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted. Although the use of shared locks renders data access more efficiently, a Shared/Exclusive Lock schema increases the lock manager’s overhead for several reasons: • The type of lock must be known before a lock can be granted • Three lock operations exist: Page 62
  • 63.
    READ_LOCK (to checkthe type of lock) WRITE_LOCK (to issue the lock) and UNLOCK (to release the lock) • The schema has been enhanced to allow a lock upgrade (from shared to exclusive) and a lock downgrade (from exclusive to shared) Locks prevent serious data inconsistencies but they lead to two major problems: • Serializability • Deadlock Serializability is guaranteed through a locking protocol known as two-phase locking. TWO-PHASE LOCKING (2PL):defines how transaction acquire and release locks • ENSURES SERIALIZABILITY • Does not prevent deadlocks The two phases are: Growing Phase: • Acquires all locks • Doesnot unlock any data Once all locks have been acquired, the transaction is in its locked state. Shrinking Phase: • Releases all locks • Cannot obtain any lock The two phase locking protocol is governed by the following rules 1. Two transaction cannot have conflicting locks 2. No unlock operation can precede a lock operation in the same transaction 3. No data are unaffected until all locks are obtained-- that is until the transaction is in its locked point. The transaction acquires all of the locks it needs until it reaches its locked point. When the locked point is reached, the data are modified. Finally the transaction is completed as it releases all of the locks it acquired in the first phase. DeadLocks A deadlock occurs when two transactions wait indefinitely for each other to unlock data. For example: T1 has locked data item X and waiting for data item Y which is held by T2 and T2 has locked data item Y and waiting for data item X which is held by T1. T1 and T2 wait for each other to unlock the required data item. Such a deadlock is also known as deadly embrace. The figure demonstrates how a deadlock is created. Page 63 Time Transaction Reply Lock status 0 Data X Data Y 1 T1: LOCK(X) OK Unlocked Unlocked 2 T2:LOCK(Y) OK Locked Unlocked 3 T1:LOCK(Y) WAIT Locked Locked 4 T2:LOCK(X) WAIT Locked Locked 3 T1:LOCK(Y) WAIT Locked Locked 4 T2:LOCK(X) WAIT Locked Locked Deadlock
  • 64.
    The three basictechniques to control deadlocks are: Deadlock prevention: A transaction requesting a new lock is aborted when there is the possibility that a deadlock can occur. If a transaction is aborted, all changes made by this transaction are rolled back and all locks obtained are released. The transaction is rescheduled for execution. Deadlock prevention works because it avoids the conditions that lead to deadlocking. Page 64
  • 65.
    Deadlock detection: The DBMSperiodically tests the database for deadlocks. If a deadlock is found, one of the transaction (“the victim”) is aborted (rolled back and restarted ) and the other transaction continues. Deadlock avoidance: The transaction must obtain all locks it needs before it can be executed. This technique avoids rollback of conflicting transactions. The choice of deadlock control method to use depends on the database environment. For example, if the probability of deadlock is Low, deadlock detection is recommended. High, deadlock prevention is recommended. If the response time is not high on the system’s priority list, deadlock avoidance is employed. DBMS use a blend of prevention and avoidance for other types of data such as XML data or data warehouses All current DBMSs support deadlock detection in transactional databases Concurrency control with time stamping methods The time stamping approach assigns a global, unique time stamp to each transaction. Time stamps must have two properties: uniqueness and monotonicity. Uniqueness ensures that no equal timestamp values can exists Monotonicity ensures that timestamp values always increase. All database operations (READ and WRITE) within the same transaction must have the same timestamp. The DBMS executes conflicting operations in timestamp order. If two transactions conflict, one is stopped, rolled back, rescheduled and assigned a new timestamp value. For each data Q two timestamp values have to be maintained. They are W-timestamp(Q) is the largest timestamp of any transaction that executed write(Q) successfully. R- timestamp(Q) is the largest timestamp of any transaction that executed read(Q) successfully. Thus timestamping increases memory needs and database processing overhead and uses a lot of system resources. WAIT / DIE AND WOUND/WAIT SCHEMES Assume that you have two conflicting transactions T1 and T2 each with a unique timestamp. Suppose T1 has a timestamp of 115 and T2 has a timestamp of 195 That is T1 is older transaction and T2 is newer transaction. Transaction Requesting Lock Transaction Owning lock Wait/ Die Scheme Wound/Wait Scheme T1(115) T2(195) T1 waits until T2 is completed and T2 releases its lock T1 preempts (rollbacks) T2 T2 is rescheduled using the same timestamp T2(195) T1(115) T2 dies (rollback) T2 is rescheduled using the same timestamp T2 waits until T1 is completed and T1 releases its lock For A transaction that requests multiple locks.How long does a transaction have to wait for each lock request? To prevent that type of deadlock, each lock request has an associate timeout value. If the lock is not granted before the timeout expires, the transaction is rolled back. Page 65
  • 66.
    Concurrency control withoptimistic methods The optimistic approach is based on the assumption that the majority of the database operations do not conflict. This has three phases they are: Read phase: the transaction T reads the data items from the database into its private workspace. All the updates of the transaction can only change the local copies of the data in a private workspace. Validate phase: Checking is performed to confirm the read values have changed during the time transaction was updating the local values. This is performed by comparing the current database values to the values that were read in the private workspace. Incase the values have changed, the local copies are thrown away and the transaction aborts. Write phase: The changes are permanently applied to the database. Database recovery management Transaction recovery reverses all the changes that the transaction made to the database before the transaction was aborted and to recover the system after some type of critical error has occurred. Examples of critical events are: 1. Hardware/software failures: Failure of this type could be harddisk media failure, bad capacitor on motherboard, or a failing memory bank, application program or O.S errors that cause the data to be overwritten, deleted or lost. 2. Human- caused incidents are two types: unintentional and intentional • Unintentional failure caused by carelessness by end users. Such errors include deleting the wrong rows from a table or shutting down the database server by accident. • Intentional events are security threats caused by hackers and virus attacks. 3. Natural disasters include fires,earthquakes,floods and power failures. A critical error can render the database in an inconsistent state. Transaction recovery Database transaction recovery uses data in transaction log to bring a database to a consistent state after a failure. Four important concepts that affect the recovery process. • The write-ahead-log protocol ensures that transaction logs are always written before any database data are actually updated. Recovery is done using the data in transaction log incase of failure. • Redundant transaction logs ensure that a physical disk failure will not affect the recovery. • When a transaction updates data, it actually updates the copy of the data in buffer(temporary storage area in primary memory). This process is much faster than accessing the physical disk every time. Later on, all buffers are written to physical disk during a single operation. • Database checkpoints are operations in which the DBMS writes all of its updated buffers to disk. While this is happening, the DBMS does not execute any other requests. Checkpoints are automatically scheduled by the DBMS several times per hour. Checkpoint operation is also registered in the transaction log. Page 66
  • 67.
    Q: Write aboutdifferent types of recovery techniques? Ans: Transaction recovery procedures generally make use of Deferred-write also called Deferred Update and Write-through also called Immediate Update techniques. In Deferred- write technique, the transaction operations do not immediately update the physical database. Instead, only the transaction log is updated. The database is physically updated only after the transaction is committed The recovery process using Deferred-write follows these steps: 1. Identify the last checkpoint in the transaction log. (This is the last time transaction data was physically saved to disk.) 2. For a transaction that started and committed before the last checkpoint, nothing needs to be done 3. For a transaction that performed a commit operation after last checkpoint, the DBMS redoes the transaction using the after values in transaction log. The changes are made in ascending order from oldest to newest. 4. For a transaction that had a ROLLBACK operation after the last checkpoint or that was left active (with neither a COMMIT nor a ROLLBACK) before failure , nothing is done because no changes were written to disk In Write-through technique, the database is immediately updated by transaction operations during the transaction’s execution, even before the transaction reaches its commit point. If a transaction aborts, a undo operation need to be done. The recovery process using write-through follows these steps: 1. first 3 points same as above. 2. For a transaction that had a ROLLBACK operation after the last checkpoint or that was left active (with neither a COMMIT nor a ROLLBACK) before failure, the DBMS undo the transaction using the before values in transaction log. TRL ID TRX NUM Prev PTR Next PTR Operation Table ID Row Value Attribute Befor e After Value 341 101 Null 352 START **Start Transaction 352 101 341 363 UPDATE PRODUCT P01 QOH 20 18 363 101 352 365 UPDATE CUSTOMER 201 baldue 100 120 365 101 363 Null COMMIT **End of transaction 397 106 Null 405 START **Start Transaction 405 106 397 415 INSERT INVOICE 305 305 415 106 405 419 INSERT LINE 305,L01 305,L01,P05 419 106 415 427 UPDATE PRODUCT P05 QOH 120 119 423 CHECK POINT 427 106 419 431 UPDATE CUSTOMER 202 baldue 500 1050 431 106 427 Null COMMIT **End of transaction 521 155 Null 525 START **Start Transaction 525 155 521 528 UPDATE PRODUCT P08 QOH 100 80 528 155 525 Null COMMIT **End of transaction ***C*R*A*S*H*** Page 67
  • 68.
    Transaction 101 : UPDATEproduct SET qoh=qoh-2 WHERE pno=’P01’; UPDATE customer SET baldue =baldue+20 WHERE cno=201; COMMIT; Transaction 106: INSERT INTO Invoice VALUES(305,202,SYSDATE()); INSERT INTO line VALUES(305,’L01’,’P05’,1,550); UPDATE product SET qoh=qoh-1 WHERE pno=’P05’; UPDATE customer SET baldue =baldue+550 WHERE cno=202; COMMIT; Transaction 155: UPDATE product SET qoh=qoh-20 WHERE pno=’P08’; COMMIT; Database recovery process for a DBMS using deferred update method is as follows 1. Identify the last checkpoint. In this case it is TRL ID 423. This was the last time database buffers were physically written to disk. 2. Transaction 101 committed before the last check point. All changes were already written to disk and so no action to be taken 3. For each transaction committed after the last checkpoint, the DBMS does for example: for transaction 106: 1.Find COMMIT (TRL ID 457) 2.Use the previous pointer values to locate the start of the transaction (TRL ID 397) 3.Use the next pointer values to locate each DML statement and apply the changes to disk, using the after values (Start with TRL ID 405, then 415,419,427 and 431) 4.Repeat the process for transaction 155 4. Transactions that ended with ROLLBACK and that were active at the time of crash, nothing is done because no changes were written to disk Page 68
  • 69.
    Unit –IV Chapter-II DistributedDatabase Management Systems Q: What are the disadvantages of centralized database management system? • Performance degradation due to growing number of remote locations over greater distances. • High costs associated with maintaining and operating large central (mainframe) database systems • Reliability problems created by dependence on a central site(single point of failure syndrome) and the need for data replication. • Scalability problems associated with the physical limits imposed by a single location(temperature conditioning and power consumption) • Organizational rigidity imposed by the database might not support the flexibility and agility required by modern global organizations. Q: What is Distributed database Ans A distributed database management system ( DDBMS ) governs the storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites. Q: What is DDBMS Ans: The software system that permits the management of the distributed database and makes the distribution transparent to users. Q: What are the advantages and disadvantages of DDBMS Ans: Advantages 1. Data are located near the greatest demand site. 2. Faster data access as end users work with only a locally stored subset of the company data. 3. Faster data processing as the workload is distributed at several sites. 4. Growth facilitation as new sites can be added without affecting other sites 5. Improved communication as local sites foster better communication between customer and staff 6. Reduced operating costs as development work is done more cheaply and more quickly on low-cost PCs than on mainframes 7. User friendly interface as the GUI simplifies training and use for end users 8. Less danger of single point failure: when one computer fails, the workload is picked up by other workstations as data are distributed at multiple sites. 9. Processor independence : The end user is able to access any available copy of the data, and an end users request is processed by any processor at the data location Disadvantages 1. Complexity of management and control: Applications must recognize data location and they must be able to stitch together data from various sites. 2. Technological difficuilty: Data integrity, transaction management, concurrency control, security, back up recovery, query optimization, access path selection etc… must all be addressed and resolved 3. Security : The probability of security lapses increases when data are located at multiple sites. The responsibility of data management will be shared by different people at several sites. 4. Lack of standards: There are no standard communication protocols at the database level. For example, different database vendors employ different often incompatible techniques to manage the distribution of data and processing in a DDBMS environment. Page 69
  • 70.
    5. Increased storageand infrastructure requirements: multiple copies of data are required at different sites, thus requiring additional disk storage space. 6. Increased training cost 7. Costs: Distributed databases require duplicated infrastructure to operate (physical location, environment, personnel, software, licensing etc) Distributed processing system in centralized database Although the database resides at only one site, each site can access the data and update the database. That is, shares the database processing chores among several sites. These sites are connected through a communication network. Refer book for diagram Distributed database requires distributed processing. Distributed processing may be based on centralized database or distributed database Both distributed processing and databases require a network to connect all components Distribution database system In a distribution database system, a database is composed of several parts known as database fragments. The database fragments are located at different sites and can be replicated among various sites. Distributed database requires distributed processing. Refer book for diagram Each database fragment is managed by local database process. (Distributed processing) For the management of distributed data to occur, copies or parts of database processing functions must be distributed to all data storage sites. Characteristics of DDBMS The DBMS must have the following functions to be classified as distributed: • Application interface to interact with the end user, application programs and other DBMSs within the DDB. • Validation to analyze data requests for syntax correctness. • Transformation to decompose complex requests into atomic data request components • Query optimization to find the best access strategy • Mapping to determine the data location of local and remote fragments. • I/O interface to read or write data from or to permanent local storage. • Formatting to prepare the data for presentation to the end user or to an application program • Security to provide data privacy at both local and remote databases. • Back up and recovery to ensure the availability and recoverability of the database in case of a failure. • DB administration features for the database administrator • Concurrency control to manage simultaneous data access and to ensure data consistency across data fragments in the DDBMS. • Transaction management to ensure that the data moves from one consistent state to another. This activity includes the synchronization of local and remote transactions as well as transactions across multiple distributed segments. Centralised DBMS functions • Receiving an application on end user’s request • Validate, analyse and decompose the request. The request might include mathematical and/or logical operations. Page 70
  • 71.
    • Ex: SELECTall customers with a balance > $1000. The request might require data from only a single table or it might require access to several tables. • Map the request’s logical -to -physical data components. • Decompose the request into several disk I/O operations • Search for, locate, read and validate the data. • Ensure database consistency, security and integrity. • Validate the data for the conditions, if any, specified by the request. • Present the selected data in the required format DDBMS components 1. Computers workstations (sites or nodes). The DDBMS must be independent of the computer system hardware 2. Network hardware and software components that reside in each workstation to allow all sites to interact and exchange data. Because the components-computers, O.S, network hardware etc-- are likely to be supplied by different vendors, it is best to ensure that DDB functions can be run on multiple platforms. 3. Communication media that carry the data from one workstation to another. The DDBMS must be communications media-independent 4. The transaction processor (TP) also known as application processor(AP) or the transaction manager (TM), which is the software components found in each computer that requests data.The transaction processor receives and processes the application’s data requests (remote and local). 5. The data processor (DP) also known as the data manager(DM), which is the software component residing on each computer that stores and retrieves data located at the site. A DP may even be a centralized DBMS. The communication between TPs and DPs is possible through protocols used by the DDBMS The protocol determines how the DDB system will 1.Interface with the network to transfer data and commands between DPs and TPs. 2.Synchronize all data received from DPs(TP side) 3.and route retrieved data to appropriate TPs(DP side) 4. Ensure database functions like security, concurrency control, back up and recovery in DDB DPs and TPs can be added to the system without affecting the operations of the other components. DPs and TPs can reside on the same computer. Page 71
  • 72.
    Levels of dataand process distribution Current database systems can be classified on the basis of how process distribution and data distribution are supported. For ex: a DBMS may store data at a single site or in multiple sites and may support data processing at a single site or at multiple sites. Single-Site Processing, Single-Site Data (SPSD)(Centralized) - all processing is done on a single host computer - all data are stored on the host computer’s local disk system. Ex: mainframe systems, single processor server and multiple processor server systems) The functions of TP and DP are embedded within the DBMS located on single computer. The DBMS usually runs under a time sharing, multitasking O.S, which allows several processes to run concurrently on a host computer Multiple-Site Processing, Single Site Data (MPSD)- multiple processes run on different computers sharing a single data repository This scenario requires a network file server running applications that are accessed through a network 1. The TP on each workstation routes all network data requests to the file server. 2. Only the data storage input/output (I/O) is handled by the file server, so offers limited capability of distributed processing. 3. The end user must make direct reference to the fileserver to access remote data. 4. All record and file locking activities are done at the end-user location 5. All data selection, search and update functions take place at the workstation, thus requiring that entire file travel through the network for processing at the workstation. Such a requirement increases network traffic, slows response time and increase communication costs. For ex: File server computer stores a CUSTOMER table containing 10000 data rows, 50 of which have balances >$1000. Suppose site A issues the query: SELECT * FROM customer WHERE cus_balance>1000; All 10000 CUSTOMER rows must travel through the network to be evaluated at site A Client/Server architecture MPSD All database processing is done at server site All database processing is done at client site Thus reduces network traffic Thus increases network traffic Capable of supporting data at multiple sites. Requires database to be located at a single site Processing is distributed Processing is not distributed Performs multiple site processing Page 72
  • 73.
    Multiple-Site Processing, Multiple-SiteData(MPMD) describes a fully DDB with support for multiple data processors and transaction processors at multiple sites Page 73
  • 74.
    Types of DDBMS: Dependingon the level of support for various types of centralized DBMSs, DDBMSs are classified as homogenous DDBMS heterogenous DDBMS Integrates only one type of centralized DBMS over a network. Integrates different types of centralized DBMS over a network. The same DBMS will be running on different server platforms (single processor server, multi-processor server) Fully heterogenous DDBMS will support different DBMSs that even support different data models (relational, hierarchical or network) running on different computer systems such as mainframes and PCs Some DDBMS implementations support several platforms, O.S and networks and allow access remote data access to another DBMS, but subject to certain restrictions. Remote access is provided on a read-only basis and does not support write privileges. Restrictions are placed on the • number of remote tables that may be accessed in a single transaction. • number of distinct databases that may be accessed • database model that may be accessed. Access may be provided to relational databases but not to network or hierarchical databases. Transparency features or functional characteristics Allowing End user to feel like the database’s only user. User believe that he is working with a centralized DBMS All complexities are hidden or transparent to the user. 1. Distribution transparency: it makes dispersed database look like a single database to the end user. 2. Transaction transparency: Allows a transaction to update data at more than one network site. Transaction transparency ensures that the transaction will be either entirely completed or aborted, thus maintaining data integrity 3. Failure transparency ensures that the system will continue to operate in the event of a node failure. Functions that were lost because of failure will be picked up by another network node 4. Performance transparency :The system will not suffer any performance degradation due to its use on a network or due to the network’s platform differences Ensures that the system will find the most cost-effective path to access remote data. 5. Heterogeneity transparency :Allows integration of several different local DBMSs (relational, hierarchical and network) under a common or global schema The DBMS is responsible for translating the data requests from the global schema to the local DBMS schema Distribution transparency Three levels of distribution transparency are 1. Fragmentation transparency: is the highest level of transparency. Neither fragment names nor fragment locations are specified prior to data access. 2. Location transparency: end user must specify the database fragment names but not their locations 3. Location mapping transparency: end user need to specify both the fragment names and their locations Page 74
  • 75.
    Ex: The CUSTOMERtable contains cno,cname,city attributes The CUSTOMER data are distributed over 3 different locations: NewYork, Atlanta and Miami The table is divided by location i.e.., NewYork customers data are stored in fragment C1, Atlanta customers data are stored in fragment C2 Miami customers data are stored in fragment C3 and each fragment is unique i.e.., it indicated each row is unique No portion of the table is replicated at any other site. Case 1: The database supports Fragmentation Transparency SELECT * FROM customer; (no fragment names or location specified) Case 2: The database supports Location Transparency SELECT * FROM C1; UNION SELECT * FROM C2; (fragment names are specified and locations are not specified) UNION SELECT * FROM C3; Case 3: The database supports Location Mapping Transparency SELECT * FROM C1 NODE NY; UNION SELECT * FROM C2 NODE ATL; (fragment names are specified and locations are not specified) UNION SELECT * FROM C3 NODE MIA; Distribution transparency is supported by distributed data dictionary (DDD) or distributed data catalogue (DDC). The DDC contains distributed global schema i.e.., entire database description. The DDC is itself distributed and replicated. Therefore, it must maintain consistency through updating all sites Transaction transparency Remote request/statements: lets a single SQL statement (or request) reference data at only one remote site or D.P A remote transaction contains one or more remote requests, all of which reference one remote site or D.P consider a transaction at site A BEGIN WORK UPDATE product SET qoh = qoh-1 WHERE pno = ‘P01’; INSERT INTO invoice (invno,cno,invdate) VALUES (305, 202,SYSDATE); COMMIT WORK; Note the following remote transaction features: • The transaction updates PRODUCT and INVOICE table (located at SiteC) • The remote transaction is sent to and executed at the remote site C. • The entire transaction can reference and be executed at only one remote DP Page 75
  • 76.
    A distributed transactioncontains one or more requests and Each request can access only one remote site at a time i.e.., It allows a transaction to reference several different local or remote DP sites . Consider a transaction at site A BEGIN WORK UPDATE product SET qoh = qoh-1 WHERE pno = ‘P01’; INSERT INTO invoice (invno,cno,invdate) VALUES (305, 202,SYSDATE); UPDATE customer SET baldue = baldue + 10 WHERE cno = 202; COMMIT WORK; Note the following following features 1. The transaction references two remote sites (B and C) 2. The first two requests (UPDATE PRODUCT and INSERT INTO INVOICE) are processed by the DP at remote site C, and the last request is processed by DP at the remote site B. 3. Each request can access only one remote site at a time. The third characteristics may create problems If suppose the table PRODUCT is divided into PROD1 and PROD2, located at Site B and C respectively, then a distributed transaction cannot execute the request- SELECT * FROM product; because this request cannot access data from more than one remote site. So the DBMS must support a distributed request. A distributed request lets a one SQL statement reference data located at several different local or remote DP sites.The ability to execute a distributed request provides fully distributed database processing capabilities because of the ability to : • Partition a database tables into several fragments • Reference to one or more of those fragments with only one request. Distributed Concurrency control Multisite, multiple-process operations are much more likely to create data inconsistencies and deadlocked transactions than are single-site systems Q: Explain Two Phase Commit Protocol : Distributed databases make it possible for a transaction to access data at several sites. A final commit must not be issued until all sites have committed their parts of the transaction. Each DP maintains its own transaction log. The two phase commit protocol requires DO-UNDO-REDO protocol Write-Ahead Protocol DO-UNDO-REDO protocol DO performs the operation and records the before and after values in the transaction log. UNDO reverses an operation, using the log entries written by the DO portion REDO redoes an operation. To ensure that the DO, UNDO, REDO operations can survive a system crash while they were being executed, a write-ahead protocol is used. The write-ahead protocol forces the log entries to be written to permanent storage before the actual operation takes place. There are two types of nodes: the coordinator node and subordinates or cohorts. Coordinator role is assigned to a node that initiates the transaction Page 76
  • 77.
    The protocol isimplemented in two phases Phase-1 Preparation The subordinate nodes receive the message, writes the transaction log, using the write-ahead protocol and Sends an acknowledgement (YES/PREPARED TO COMMIT) or (NO/NOT PREPARED) message to coordinator. If all nodes are PREPARED TO COMMIT , the transaction goes to phase -2 If one or more nodes reply NO or NOT PREPARED The coordinator broadcasts a ABORT message to all subordinates Phase -2 The Final Commit The coordinator broadcasts a COMMIT message to all subordinates and wait for replies Each subordinate receives the COMMIT message, and then updates the database using the DO protocol The subordinates reply with a COMMITTED or NOT COMMITTED message to the coordinator. If one or more subordinates did not COMMIT, it sends an ABORT message, thereby forcing all subordinates to UNDO Performance transaparency and query optimization The DDBMS uses query optimization to decide which copy of the data to access. The objective of query optimization routine is to minimize the total cost associated with the execution of a request like 1. Access time (I/O) cost involved in accessing the physical data stored on disk 2. Communication cost associated with the transmission of data among nodes in DDB systems 3. CPU time cost associated with the processing overhead of managing distributed transactions To evaluate query optimization, the TP must receive data from DP, synchronize it, assemble the answer and present it to end user or an application Most of the algorithms proposed for query optimization are based on two principles: 1. The selection of the optimum execution order. 2. The selection of sites to be accessed to minimize communication costs. Query optimization algorithms can be evaluated on the basis of operation mode or the timing of its optimization Operation modes can be classified as manual or automatic-- Cost effective path is found and scheduled by end user or programmer in manual Cost effective path is found and scheduled by DDBMS. Page 77 Query optimization classification On the basis of operation modes On the basis of type of information used On the basis of when the optimization is done Manual Automatic Static query optimizatio n Dynamic query optimizatio n Statistically based query optimizatio n Rule- based query optimizatio n
  • 78.
    Classification according towhen the optimization is done 1. Static query optimization takes place at compilation time. When the program is submitted to the DBMS for compilation, it creates the plan to access the database. 2. Dynamic query optimization takes place at execution time. Its cost is measured by run- time processing overhead Classification according to type of information that is used to optimize the query 1. Statistically based query optimization algorithm uses statistical information like size, number of records, average access time of database. The statistical information is managed by DDBMS and is generated in dynamic mode or in manual mode. In dynamic statistical generation mode, the DDBMS automatically evaluates and updates the statistics after each access. 2. Rule- based query optimization algorithm is based on a set of user-defined rules to determine the best access strategy. The rules are entered by the end user or database administrator. Distributed database design The design of a distribution database design introduces three new issues. 1. How to partition the database into fragments. 2. Which fragment to replicate. 3. Where to locate those fragments and replicas Data fragmentation : it allows you to break a single object (database,table etc..) into two or more segments or fragments. Each fragment can be stored at any site (Information about data fragmentation is stored in distributed data catalog (DDC), from which it is accessed by TP to process user requests. Fragmented tables can be recreated from its fragments by using Joins and Unions.) Horizontal fragmentation: It refers to the division of a table into subsets (fragments) of rows. Each fragments is stored at a different node, and each fragment has unique rows. (Each fragment represents the equivalent of a SELECT statement, with the WHERE clause on a single attribute. Ex: cno cname cstate climit baldue 1 ANU TN 3500 2700 2 RAMA AP 6000 1200 3 RADHA TN 4000 3500 4 GOPI AP 1200 550 Suppose XYZ company requires information about its customers in the 2 sites (AP, TN) And each state requires data regarding local customers only So, distribute data by state ie., define horizontal fragmentation by state Fragment Name Location Condition Node Name Customer Numbers No. of nodes C1 TN cstate = TN CHE 1,3 2 C2 AP cstate=AP VJA 2,4 2 Page 78
  • 79.
    Table fragments in2 states Table name: C1 location : Tamil Nadu Node:CHE cno cname cstate climit baldue 1 ANU TN 3500 2700 3 RADHA TN 4000 3500 Table name: C2 location : AndhraPradesh Node:VJA cno cname cstate climit baldue 2 RAMA AP 6000 1200 4 GOPI AP 1200 550 Vertical fragmentation: It refers to the division of a table into attribute (column) subsets. Each subset is stored at a different node and each fragment has unique columns with the exception of the key column which is common to all fragments. (For Ex: Suppose a company is divided into 2 departments --- service and collection department Each department is in separate building and has interest in only some of the CUSTOMER attributes. Vertical fragmentation of CUSTOMER table Fragment Name Location Node Name Attribute Names V1 Service building SVC cno, cname, cstate V2 Collection building ARC cno, climit, baldue Vertically fragmented table contents Table name: V1 location : Service building Node:SVC cno cname cstate 1 ANU TN 2 RAMA AP 3 RADHA TN 4 GOPI AP Table name: V2 location : collection building Node:ARC cno climit baldue 1 3500 2700 2 6000 1200 3 4000 3500 4 1200 550 Mixed fragmentation: It refers to a combination of horizontal vertical strategies. It requires two step process 1. horizontal fragmentation is introduced 2. vertical fragmentation is used within each horizontal fragment (The XYZ company’s structure requires CUSTOMER data to be fragmented horizontally to 2 company locations (TN,AP) and Within locations, the data must be fragmented vertically to 2 departments (Service and Collection) Page 79
  • 80.
    Mixed Fragmentation ofCUSTOMER table Fragment Name Location Horizontal Criteria Node Name Resulting Rows at Site Vertical criteria attributes at each fragments M1 TN cstate = TN CHES 1,3 cno,cname,cstate M2 TN cstate = TN CHEC 1,3 cno,climit,baldue M3 AP cstate = AP VJAS 2,4 cno,cname,cstate M4 AP cstate = AP VJAS 2,4 cno,climit,baldue Table fragmentation after mixed fragmentation process Data replication: It refers to the storage of data copies at multiple sites. Replicated data are subject to the mutual consistency rule , which requires that 1. All copies of data fragments be identical. 2. DDBMS must ensure that a database update is performed at all sites where replicas exist. Benefits of replication Fragment copies can be stored at several sites to serve specific information requirements can • Increases data availability and response time • Reduced communication and query costs. • Better load distribution • Improved data failure tolerance and Disadvantages • It imposes additional DDBMS processing overhead because each copy must be maintained by the system and also have to decide which replicated copy to use. • Increased Transaction time as data must be updated at several sites. • Storage Cost Replication Conditions A fully replicated database stores multiple copies of all database fragments at multiple sites. A partially replicated database stores multiple copies of some database fragments at multiple sites. An unreplicated database stores each database fragment at a single site. Page 80 Table name: M1 location : Tamil Nadu Node:CHES cno cname cstate 1 ANU TN 3 RADHA TN Table name: M2 location : Tamil Nadu Node:CHEC cno climit baldue 1 3500 2700 3 4000 3500 Table name: M3 location : AndhraPradesh Node:VJAS cno cname cstate 2 RAMA AP 4 GOPI AP Table name: M3 location : AndhraPradesh Node:VJAC cno climit baldue 2 6000 1200 4 1200 550
  • 81.
    Factors for DataReplication Decision 1. Database Size : Replicating large amount of data will have impact on storage requirements,Data transmission cost and Higher network bandwidth 2. Usage Frequency How frequently the data need to be updated and how big is the database. Frequently used data needs to be updated more often. Data allocation - deciding where to locate data. Data allocation strategies are as follows: With the centralized data allocation, the entire database is stored at one site. With partitioned data allocation, the database is divided into two or more disjoint parts and stored at two or more sites. With replicated data allocation, copies of one or more database fragments are stored at several sites. Data allocation algorithms take into consideration a variety of factors: 1. Performance and data availability goals 2. Size, number of rows, the number of relations that an entity maintains with other entities. 3. Types of transactions to be applied to the database, the attributes accessed by each of those transactions. Explain about Client/Server Architecture Client/server architecture refers to the way in which computers interact to form a system. It features a user of resources or a client and a provider of resources or a server . The architecture can be used to implement a DBMS in which the client is the transaction processor (TP) and the server is the data processor (DP). Client/Server Architecture Client/Server Advantages 1. Client/server solutions are less expensive and allow the end user to use the microcomputer’s graphical user interface (GUI), thereby improving functionality and simplicity. 2. There are more people with PC skills than with mainframe skills. 3. Numerous data analysis and query tools exist to allow interaction with many of the DBMSs. 4. It is cheap to develop an application for PCs than for mainframes. Client/Server Disadvantages 1. The client/server architecture creates a more complex environment with different platforms. 2. An increase in the number of users and processing sites often paves the way for security problems. 3. The burden of training increases the cost of maintaining the environment. Page 81
  • 82.
    Unit -5 Chapter-1 Business Intelligence and Data Warehouses. Why there is a need for data analysis? Or What are DecisionSupportSystems and what role do they play in business environment. Ans: DecisionSupportSystem is an arrangement of computerized tools used to assist managerial decision making within a business. Organizations tend to grow and prosper as they gain better understanding of their environment. Data analysis can provide information about short-term tactical evaluations such as Are our sales promotion working? What market percentage are we controlling? Are we attracting new customers? Tactical and strategic decisions are also shaped by constant pressure from external and internal forces, including globalization, the cultural and legal environment and technology. Business climate is dynamic, and thus mandates their prompt reaction to change in order to remain competitive. Different managerial levels require different decision support needs. For ex: TPS based on operational databases are tailored to serve the information needs of people who deal with short term inventory, accounts payable and purchasing. Middle level managers, general managers, vice presidents and presidents focus on strategic and tactical decision making that require a DSS. Differences between Operational and Decision Support Data characteristics Characteristic Operational Data Decision Support Data Data Current Operations Real- time data Historic Data Snapshot of company data Time component (week/month/year) Granularity Atomic-detailed data Summarized data Summarization Level Low; some aggregate yields High; many aggregation levels Data model Highly normalized Mostly Relational DBMS Non-normalized Complex structures Transaction type Mostly updates Mostly query Transaction Volumes High update volumes Periodic loads and summary calculations Transaction Speed Updates are critical Retrievals are critical Query Activity Low to medium High Query Scope Narrow range Broad range Query Complexity Simple to medium Very complex Data Volumes Hundreds of megabytes upto gigabytes Hundreds of gigabytes upto terabytes The many differences between operational data and decision support data are good indicators of the requirements of the decision support database. Decision Support Database Requirements There are four main requirements for a decision support database. 1. The Database Schema: must support complex (non-normalized) data representations. Page 82
  • 83.
    2. Data Extractionand Filtering: The data extraction capabilities should also support different data sources and multiple vendors. Using data from multiple external sources also usually means having to solve data formatting conflicts. Finally, to filter and integrate the operational data into decision support database. 3. End- User Analytical Interface: The decision support DBMS must generate the necessary queries to retrieve the appropriate data from decision support database. 4. Database Size: To support very large databases (VLDBs), the DBMS might be required to use advanced hardware, such as multiple disk arrays, multiple-processor technologies such as symmetric multiprocessor(SMP) or a massively parallel processor (MPP). What is data warehouse? Discuss about the properties of a data warehouse? Ans: Data warehouse is an integrated, subject oriented, time variant, non volatile collection of data that provides support for decision making. The following are important properties of a data warehouse. Integrated: The data warehouse is a centralized, consolidated database that integrates data derived from the entire organization and from multiple sources with diverse formats. Data integration implies that all business entities, data elements, data characteristics and business metrics are described in the same way throughout the enterprise. Subject oriented: Data warehouse data are arranged and optimized to provide answer to questions coming from diverse functional areas within a company. Data warehouse data are organized and summarized by topic. . Instead of storing a INVOICE table, data warehouse stores its “sales by product” and “sales by customer” components. Time variant: Warehouse data represent the flow of data through time. Once data are periodically uploaded to the data warehouse, all time dependent aggregations are recomputed. Non Volatile: Once data enter the data warehouse, they are never removed. Because data are never deleted and new data are continually added the data warehouse is always growing. The ETL process in the creation of data warehouse Page 83
  • 84.
    What is BusinessIntelligence? Business Intelligence is a framework that allows a business to transform data into information, information into knowledge and knowledge into wisdom. The following are the BI architectural components. 1. Data extraction, transformation and loading (ETL) tools : this component is in charge of collecting, filtering, integrating and aggregating operational data to be saved into a data store. 2. Data Store: the data store is optimized for decision support and generally represented by a data warehouse or a data mart. 3. Data Query and Analysis Tools: this component performs data retrieval, data analysis and data mining tasks using the data in the data store represented in the form of an OLAP tool. 4. Data presentation and visualization tools: this component is in charge of presenting the data to the end user. What are datamarts? Ans: A data mart is a small, single - subject data warehouse subset that provides decision support to a small group of people. Instead of creating a data warehouse for entire organization, manageable data sets that are targeted to meet the special needs of small groups within the organization are created. These smaller data store are called data marts. What is OLAP? What are the four main characteristics of OLAP systems? Ans: Online analytical processing create an advanced data analysis environment that supports decision making, business modeling and operation research. OLAP systems share four main characteristics: 1. They use multidimensional data analysis techniques. 2. They provide advanced data anlysis support. 3. They provide easy - to - use end user interface. 4. They support client/server architecture. Multidimensional Data Analysis Techniques: In multi dimensional analysis, data are processed and viewed as part of a multidimensional structure. This is useful in business decision making because decision makers tend to view business data as data that are related to other business data. These techniques are augmented by the following functions: 1. Advanced data presentation functions such as 3- D graphics, 3-D cubes etc. Such facilities are compatible with desktop spreadsheets etc. 2. Advanced data aggregation, consolidation and classification functions: create multiple data aggregation levels, slice and dice data and drill down and roll up data across different dimensions and aggregation levels 3. Advanced computational functions: These include business oriented variables (market share, sales margins etc..) and financial accounting ratios and statistical and forecasting functions. These functions are provided automatically. 4. Advanced data modeling functions like linear programming and other modeling tools Advanced database support. OLAP tools must have advanced data access features such as • Access to many different kinds of DBMSs • Access to aggregated data warehouse data as well as detail data found in operational databases • Advanced data navigation features such as drill down and roll up. • Rapid and consistent query response times. Page 84
  • 85.
    Easy to useend user interface:OLAP features become more useful when access to them is kept simple. OLAP tool vendors have included easy-to-use graphical interfaces. Explain about OLAP architecture? OLAP operational characteristics can be divided into 3 main modules. • Graphical User Interface (GUI) • Analytical Processing Logic • Data Processing Logic OLAP System The OLAP System exhibits.. * Client/Server Architecture *Easy-to-use GUI Dimensional Presentation Dimensional Modeling Dimensional analysis * Multidimensional Data Analysis Manipulation Structure *Database Support Datawarehouse * Dimensional Operational database * Aggregated Relational * Very Large Database Multi dimensional Above figure illustrates that OLAP systems are designed to use both operational and data warehouse data. Above figure shows that the OLAP system components are located on a single computer. One problem with installation shown above is that each data analyst must have a powerful computer to store the OLAP system and perform all data processing locally. Each analyst uses a separate copy of the data. Therefore, the data copies must be synchronized to ensure that analysts are working with same data. OLAP Server arrangement: Here OLAP gui runs on client workstation while OLAP engine or server runs on a shared computer and this forms a middle layer. The OLAP server will accept and processes the data processing requests generated by the many end user analytical tools. The end- user GUI may be a plug- in module integrated with Excel, Lotus 1-2-3etc.. Page 85 OLAP GUI Analytical Processing Logic Data Processing Logic Operational Data Data Warehouse * Integrated * Subject- Oriented * Time- variant * Nonvolatile
  • 86.
    Why data warehousewhen OLAP provides the necessary multi dimensional data analysis of operational data? Ans: Because the data warehouse handles the data component more efficiently than OLAP does. What is ROLAP? Relational online analytical processing provides OLAP functionality by using relational databases and familiar relational query tools to store and analyze multi dimensional data. ROLAP adds the following extentions to traditional RDBMS technology. • Multi dimensional data Schema support within the RDBMS : uses Star Schema • Data access language and query performance optimized for multi dimensional data. Uses bitmapped indexes as they are efficient at handling large amounts of data than the indexes used in RDBMS. Bitmapped indexes are primarily used in situations where the number of possible values for an attribute is fairly small. • Support for very large databases (VLDBs) What is MOLAP? Multi dimensional online analytical processing provides OLAP functionality to Multidimensional database management. MDBMS uses special proprietary techniques to store data in matrix-like n-dimensional arrays. MDBMS end users visualize the stored data as a 3-D cube known as data cube The location of each data value in the data cube is a function of the x-,y-,z-axes in a 3D space. The x-,y-,z-axes represent the dimensions of the data value. Hypercube is a data cube grown to n- dimensions Because the data cube is predefined with a predefined number of dimensions, the addition of a new dimension requires that the entire data cube be recreated and this process is time consuming. Differences between ROLAP and MOLAP Multi dimensional data analysis requires some type of multidimensional data representation, which is normally provided by the OLAP engine. Whatever the arrangement of the OLAP components, multi dimensional data must be used. Page 86
  • 87.
    Discuss about StarSchema Architecture? Ans:The star schema is a data modeling technique used to map multidimensional decision support data into relational database. The basic star schema has four components: facts, dimensions, attributes and attribute hierarchies Facts: Facts are numeric measurements that represent a specific business aspect or activity. Ex: units, costs, prices. Facts are normally stored in a fact table that is center of the star schema. The fact table contains facts that are linked through their dimensions. Facts computed or derived at run time are called metrics Dimensions: provide descriptive qualifying characteristics about the facts through their attributes. For ex: sales might be compared by product from region to region and from one time period to the next. Dimensions are stored in dimension tables. The figure shows star schema for sales with product, location and time dimensions. Attributes: Each dimension table contains attributes. Attributes are often used to search, filter or classify facts. Possible attributes for Location dimension are Region, State, City, Store etc... Possible attributes for Product dimension are Product Type, Product ID, brand, package, presentation, color, size and Possible attributes for Time dimension are Year, Quarter, Month, Week , Day, Time of day and so on Attribute hierarchies: Attributes with in dimensions can be ordered in a well defined attribute hierarchy. The attribute hierarchy provide a top- down data organization that is used for two main purposes : aggregation and drill down/roll-up data analysis. Star Schema Representation Ans: The fact table is related to each dimension table in a (M:1) relationship i.e, many fact rows are related to each dimension row and so the primary key of the fact table is composite primary key. As per the figure, each sales record represents each product sold to a specific customer, at a specific time and in a specific location DBMS that is optimized for decision support first searches the smaller dimension tables before accessing the larger fact tables. Page 87
  • 88.
    Performance improving techniquesfor the star schema Ans: Four techniques are often used to optimize data warehouse design: Normalizing dimensional tables: the resulting schema with normalized dimension tables is called snowflakes schema. Maintaining multiple fact tables to represent different aggregation levels. Denormalizing fact tables. Partitioning and replicating tables. What is data mining? Explain its various phases. Ans: Data mining tool automatically search the data for anomalies and possible relationships, thereby identifying problems that have not yet been identified by the end user. Data mining is very helpful in finding practical relationships among data that help define customer buying patterns, improve product development and acceptance, reduce health care fraud, analyze stock markets etc.. Data mining is subject to four general phases: Data preparation phase : data sets to be used by the data mining operation are identified and cleansed. Data analysis and classification phase: identifies common data characteristics or patterns. The data mining tool applies specific algorithms to find Data groupings, classifications, clusters or sequences Data dependencies, links or relationships Data patterns, trends and deviations Knowledge Acquisition phase: selects the appropriate modeling or knowledge acquisition algorithms to generate a computer model that reflects the behavior of the target data set. Prognosis phase: In this phase, the data mining findings are used to predict future behavior and forecast business outcomes. To project the likely outcome of new product rollout or a new marketing promotion. What are indictive or intelligent databases? Ans: The databases that not only store data and various statistics about data usage, but also have the ability to learn about and extract knowledge from the stored data. Page 88
  • 89.
    Explain about SQLextensions for OLAP? Ans: The following are important SQL extensions for OLAP The ROLLUP extension: is used with the GROUP BY clause to generate aggregates by different dimensions. Syntax: SELECT column1, column2 [,…],aggregate_function(expression) FROM table1, [table2,…] [WHERE condition] GROUP BY ROLL UP(column1,column2[,…]) [HAVING condition] [ORDER BY column1[,column2,…]] The order of the column list within the GROUP BY ROLL UP is very important. The last column in the list will generate a grand total. All other columns will generate subtotals. The CUBE extension: Is used to compute all possible subtotals within groupings based on multiple dimensions. The CUBE extension will enable to get a subtotal for each column listed in the expression and grand total for the last column listed. Syntax: SELECT column1, column2 [,…],aggregate_function(expression) FROM table1, [table2,…] [WHERE condition] GROUP BY CUBE(column1,column2[,…]) [HAVING condition] [ORDER BY column1[,column2,…]] Materialized views: A materialized view is a dynamic table that not only contains the SQL query command to generate the rows, but also stores the actual rows. The materialized view created the first time the query is run and summary rows are stored in the table. The materialized view row are automatically updated when the base tables are updated. Syntax: CREATE MATERIALIZED VIEW view_name BUILD { IMMEDIATE | DEFERRED} REFRESH {[FAST | COMPLETE | FORCE]} ON COMMIT [ENABLE QUERY REWRITE] AS select_query The BUILD clause indicate when the materialized view rows are actually populated. IMMEDIATE indicates rows are populated right after the command is entered. DEFFERED indicates rows are populated at a later time. Until then view will be in an unusable state. The REFRESH clause lets you indicate when and how to update the view when new rows are added to base tables. FAST indicates updates only affected rows. COMPLETE indicates a complete update will be made for all rows in materialized view. FORCE indicates that the DBMS will first try to do a FAST update, otherwise it will do a COMPLETE update. ON COMMIT indicates the updates to the materialized view will take place as a part of the commit of the DML transaction that updated the base tables. ENABLE QUERY REWRITE option allow DBMS to use the materialized views in query optimization. Page 89
  • 90.
    Unit –V Chapter-IIDatabase Administration And Security Q: Explain the need for and role of database in an organization? Ans: The DBMS helps a organization in many ways: Interpretation and presentation of data Distribution of data and information to the right people Data preservation and monitoring the data usage Control over data duplication and use At the top management level, the database role is: Provide the information necessary for strategic decision making. Provide access to external and internal data Provide a framework for defining and enforcing organization policies At the middle management level, the database role is: Deliver the data necessary for tactical decision and planning Monitor and control the allocation and use of company resources. Providing a framework for enforcing and ensuring the security and privacy of the data in the database. At the operation management level, the database role is: Represent and support the company operations as closely as possible. Produce query results with in specified performance levels. Enhance the company’s short term operational ability. Q: The evolution of the database administration function? Ans: The cost of data and managerial duplication in decentralized and old file system gave rise to centralized data administration function known as electronic data processing (EDP) or data processing (DP) department. DP resolves data conflicts created by the duplication and/or misuse of data. The advent of the DBMS and its shared view of data produced a new level of data management and led the DP department to evolve into information systems (IS) department. The responsibilities of IS department are A service function to provide end user with active data management support. A production function to provide end users with specific solutions for their information needs. As the number of databases grew, data management became increasingly complex, thus leading to the development of database administration function. The person responsible for the control of the centralized and shared database became known as the database administrator (DBA). Devise administration Strategy Has responsibility and authority to plan, define, No authority to enforce it implement and enforce the policies, standards and Page 90 DBA function Staff Position Line Position
  • 91.
    No authority toresolve conflicts procedures used in data administration activity. The fast-paced changes in DBMS technology dictate changing organization styles. For example Distributed Databases can force to decentralize the data administration function Internet accessible data and growing number of data warehousing applications are likely to add to the DBA’s data modeling and design activities. The new microcomputer environment required the DBA to develop a new set of technical and managerial skills. Functions of DBA DBA function by dividing the DBA operations according to DBLC phases DBA function requires personnel to cover the following activities. Several different and incompatible DBMSs installed to support different operations. There may also be variety of microcomputer DBMSs installed in different departments. In such an environment, the company might have one DBA assigned for each DBMS. The general coordinator of all DBAs is known as System administrator. Differentiate between the responsibilities of data administrator (DA) and Database Administrator (DBA)? Ans: The DA is responsible for controlling the overall corporate data resource, both computerized and manual. Thus the DA’s job description covers a larger area of operations than that of the DBA because the DA is in charge for controlling not only the computerized data, but also the data outside the scope of the DBMS. Data Administrator(DA) Database Administrator (DBA) 1. Does strategic planning controls and supervises 2. Sets long-term goals Executes plans to reach goals 3. Sets policies and standards Enforces policies and procedures Enforces programming standards Is broad in scope Is narrow in scope Focuses on the long term Focuses on the short term (daily operations) Has a managerial orientation Has a technical orientation Is DBMS-independent Is DBMS-specific Page 91
  • 92.
  • 93.
    Q: What aredesired DBA skills? OR Discuss the abilities and responsibilities of DBA Ans: The DBA skills can be divided into two categories managerial and technical and summarized in the following table. Managerial Technical Broad Business understanding Broad data-processing background Coordination skills Systems development life cycle knowledge Analytical skills Structured Methodologies: Data flow diagrams Structures, Charts Programming languages Conflict resolution skills Database life cycle knowledge Communication skills (oral and written) Database modeling and design skills Conceptual Logical Physical Negotiation Skills Operational skills: database implementation, data dictionary management, security and so on Experience : 10 years in a large DP department Responsibilities (roles of DBA) The DBA’s Managerial Role: The DBA delivers services such as End-User Support: These include • Gathering User Requirements • Building end-user confidence • Resolving conflicts and problems • Finding solutions to information needs • Ensuring quality and integrity of data and applications • Managing the training and support of DBMS user Policies, Procedures and Standards: The DBA must define, document, and communicate the policies, procedures and standards before they can be enforced. Policies are general statements or action Example: All users must have passwords Passwords must be changed every six months. Standards are rules that are used to evaluate the quality of the activity. Example: A password must have a minimum of five characters. A password must have a maximum of twelve characters. Procedures are written instructions that describe a series of steps to be followed during the performance of a given activity Example: to create a user account 1. the user sends a written request to DBA. 2. the DBA approves the request and forwards it to computer operator 3. the operator creates the account and assigns a temporary password and sends to the user 4. a copy is sent to DBA 5. user changes temporary password to permanent one. The DBA must define, communicate and enforce procedures that cover areas such as 1. End-user database requirements gathering 2. database design and modeling 3. documentation and naming conventions Page 93
  • 94.
    4. design, codingand testing of database application programs 5. database software selection 6. database security and integrity 7. database backup and recovery 8. database maintenance and operation 9. end-user training Data Security, privacy and integrity DBA must use security mechanisms provided by DBMS and also must team up with internet security experts to safeguard data from possible attacks or unauthorized access. Data Backup and Recovery: The Backup and recovery measures must include at least: • Periodic data and application backups • Proper backup identification • Convenient and Safe backup storage. • Physical protection of both hardware and software • Personal access control to software of a database installation • Insurance coverage for the data in the database. Data distribution and Use: The DBA is responsible for ensuring that the data are distributed to the right people at right time and in right format. The DBA’s technical role The DBA’s technical role requires a broad understanding of DBMS’s functions, configuration, programming languages, data modeling and design methodologies and so on. The technical aspects of the DBA’s job • Evaluating, selecting and installing the DBMS and related utilities: To match DBMS capability to organization’s needs, the DBA must check the following features in DBMS  DBMS model DBMS storage capacity Application development support Security and integrity Back up and Recovery Concurrency Control Performance Database administration tools Interoperability and data distribution Portablity and standards Hardware Data dictionary Vendor training & support Available third-party tools Cost • Designing and implementing Databases and Applications: The DBA has to review the database applications design to ensure that transaction are Correct: the transaction mirror real world events Efficient: the transaction do not overload the DBMS. Compliant: complies with integrity rules and standards. • Testing and evaluating databases and applications: The evaluation process should cover all technical aspects of both the applications and the database. This process has to enforce all data validation rules. • Operating the DBMS utilities and applications: DBMS operations are divided into four main areas: System support Performance monitoring and tuning Back up and recovery Security auditing and monitoring. Page 94
  • 95.
    • Training andsupporting users: Training people to use the DBMS and its tools is included in DBA’s technical activities. • Maintaining the DBMS utilities and applications: periodic DBMS maintenance includes management of the physical or secondary storage devices. Maintenance activities also include upgrading the DBMS and utility software. The DBA’s role as an arbitrator between data and users. The DBA also verifies that programmer and end-user access meets the required quality and security standards. Data base users might be classified by the Type of decision making support required (operational, tactical or strategic) Degree of computer knowledge (novice, proficient or export) Frequency of access (casual, periodic or frequent) The DBA must be able to interact with all of those people and understand their needs. Q: What are the various database administration tools? Explain. Ans: Data dictionary: Data dictionary is defined as “a DBMS component that stores definition of data characteristics and relationships”. The data dictionary resembles an x-ray of the company’s entire dataset, and it is a crucial element in data administration. Two main types of data dictionaries exists: integrated and standalone. An integrated data dictionary is included with the DBMS. The DBA may use third party standalone data dictionary. Data dictionaries can also be classified as active or passive. An active data dictionary is automatically updated by DBMS with every database access, thereby keeping its access information up to date. A passive data dictionary is not updated automatically and usually requires running a batch process. The DBA can use the data dictionary to support data analysis and design. For ex, the DBA can create a report that lists all data elements to be used in a particular application, a list of all users who access a particular program, a report that checks data redundancies. CASE tools: CASE is acronym for computer aided systems engineering. A CASE tool provides an automated framework for the SDLC. Uses structured methodologies and graphical interfaces. CASE tools are usually classified according to the extent of support they provide for the SDLC. For ex: Front-end CASE tools provide support for planning, analysis and design phases. Back-End CASE tools provide support for the coding and implementation phases. Following are the benefits of CASE tools. 1. A reduction in development time and costs. 2. Automation of the SDLC 3. Standardization of systems development methodologies. 4. Easier maintenance of application system developed with CASE tools. A typical CASE tool provides five components: 1. Graphics designed to produce structured diagrams such as data flow diagrams, ER diagrams, class diagrams etc. 2. Screen painters and report generators. 3. An integrated repository for storing and cross referencing the system design data. 4. An analysis segment to provide fully automated check on systems consistency, syntax and completeness. 5. A program documentation generator. Page 95
  • 96.
    Q: Explain theusage of ORACLE for database administration? Ans: To perform any administrative task, you must connect to the database using a username with administrative (DBA) privileges. By default ORACLE automatically created SYSTEM and SYS user id that have administrative privileges with every new database you create. Creating tablespaces and data files: In ORACLE a database is logically composed of one or more tablespaces. A tablespace is a logical storage space. The tablespace data are physically stored in one or more datafiles. ORACLE automatically creates the tablespace and data files. The following are examples. 1. the SYSTEM tablespace is used to store the data dictionary data. 2. the USERS tablespace is used to store the table data created by the end users. 3. the TEMP tablespace is used to store the temporary tables and indexes created during the execution of SQL statements. 4. the UNDOTBS1 tablespace is used to store database transaction recovery information. Managing the database objects: tables, views, triggers and procesures The ORACLE enterprise manager gives the DBA a graphical user interface to create, edit, view and delete database objects in the database. A database object is basically any object created by end users. Managing users and establishing security: One of the most common database administration activities is creating and managing database users. The security section of the ORACLE enterprise manager’s administration page enables the DBA to create users, roles and profiles. 1. A user is a uniquely identifiable object that allow a given person to login to the database 2. A role is a named collection of database access privileges that authorize a user to connect to the database and use the database system resources. 3. A profile is a named collection of settings that control how much of the database resource a given user can use. Customizing the database initialization parameters: Fine tuning a database is another important DBA task. This task usually requires the modification of database configuration parameters. Some of which can be changed in real time using SQL commands. Each database has an associated database initialization file that stores its run-time configuration parameters. The initialization file is read at instance start up and is used to set the working environment for the database. Creating a new database: Using the ORACLE database configuration assistant, it is simple to create a database. The DBA uses a wizard interface to answer a series of questions to establish the parameters for the database to be created. This process creates the database structure, including the necessary data dictionary tables, the administrator, users accounts and other supporting process required by the DBMS to manage the database. Page 96
  • 97.
    Responsibilities of DBA:The DBA is responsible for: Designing the logical and physical schemas, as well as widely-used portions of the external schema. Security and authorization. Data availability and recovery from failures. Database tuning: The DBA is responsible for evolving the database, in particular the conceptual and physical schemas, to ensure adequate performance as user requirements change. A DBA needs to understand query optimization even if s/he is not interested in running his or her own queries because some of these responsibilities (database design and tuning) are related to query optimization. Unless the DBA understands the per-formance needs of widely used queries, and how the DBMS will optimize and execute • these queries, good design and tuning decisions cannot be made Page 97
  • 98.
  • 99.
    What is datawarehouse? Discuss about the properties of a data warehouse? Ans: Data warehouse is an integrated, subject oriented, time variant, non volatile collection of data that provides support for decision making. The following are important properties of a data warehouse. Integrated: The data warehouse is a centralized, consolidated database that integrates data derived from the entire organization and from multiple sources with diverse formats. Data integration implies that all business entities, data elements, data characteristics and business metrics are described in the same way throughout the enterprise. Subject oriented: Data warehouse data are arranged and optimized to provide answer to questions coming from diverse functional areas within a company. Data warehouse data are organized and summarized by topic. . Instead of storing a INVOICE table, data warehouse stores its “sales by product” and “sales by customer” components. Time variant: Warehouse data represent the flow of data through time. Once data are periodically uploaded to the data warehouse, all time dependent aggregations are recomputed. For ex: once data for previous weekly sales are uploaded, the weekly, monthly, yearly and other time dependent aggregates for products, customers, stores and other variables are also updated. Once the data enters the data warehouse, the time ID assigned to the data cannot be changed. Non Volatile: Once data enter the data warehouse, they are never removed. Because data are never deleted and new data are continually added the data warehouse is always growing. Page 99
  • 100.
  • 101.
  • 102.