Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Database design for HPC
1. Database Design for HPC
CC-2002
Dr. C. Saravanan, Ph.D., NIT Durgapur.
dr.cs1973@gmail.com
Database Design ... Dr. C. Saravanan, NIT Durgapur.
2. Introduction to Database
• A database is structured collection of data.
• telephone directories
• Databases may be stored on a computer.
• Database Management Systems (DBMS)
• Relational Database Management Systems(RDBMS).
• Computer-based databases are usually organised into one or
more tables.
• A table stores data in a format similar to a published table and
consists of a series of rows (entities) and columns (fields/attributes).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
3. Contd…
• A DataBase Management System (DBMS) is an aggregate of data,
hardware, software, and users that helps an enterprise manage its
operational data.
• The main function of a DBMS is to provide efficient and reliable
methods of data retrieval to many users.
• If a college has 10,000 students each year. Each student can have
approximately 10 grade records per year, then over 10 years, the
college will accumulate 1,000,000 grade records.
• It is not easy to extract records satisfying certain criteria from such a
set, and by current standards, this set of records is quite small.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
4. Database Components
• Hardware
• Computer memory into two classes:
• Internal memory (ROM/RAM-volatile) and
• External memory (HDD/TAPE/CD/DVD/etc.-nonvolatile).
• Software
• Users interact with database systems through query languages.
• Define the data structures (Data Definition Component).
• Retrieve and modify the data (Data Manipulation Component).
• Users
• Database Administrator.
• End User.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
5. Introduction to Information
• Processed Data which are related and are in support of one another is
Information.
• In an information system, input data consist of facts and figures,
which form the systems raw material.
• For example,
• Train ticket
• Student mark statement
• Staff pay slip
Database Design ... Dr. C. Saravanan, NIT Durgapur.
6. Atomicity
• A transaction is a sequence of database operations (that usually
consists of updates, with possible retrievals) that must be executed in
its entirety or not at all. This property of transactions is known as
atomicity.
• A typical example includes the transfer of funds between two account records
A and B in the database of a bank.
• Decrease the balance of account A by d dollars; Increase the balance of
account B by d dollars.
• If only the first operation is executed, then d dollars will disappear from the
funds deposited with the bank. If only the second is executed, then the total
funds will increase by d dollars. In either case, the consistency of the database
will be compromised.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
7. Consistency, Isolation, & Durability
• A transaction should transform a database from one consistent state to
another consistent state. A property of transactions known as consistency.
• The transaction management component ensures that the execution of
one transaction is not influenced by the execution of any other transaction.
This is the isolation property of transactions. Each transaction should occur
independently of other transactions occurring at the same time.
• Finally, the effect of a transaction to the state of the database must be
durable i.e. persist in the database after the execution of the transaction is
completed. Transactions that have been completed should remain
persistent, even in the event of a system failure before all of its changes are
reflected to the data and index files on disk.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
8. • If a transaction aborts in the middle, all operations up to that point
should be undone completely.
• File 1 update -- successful
• File 2 update -- successful
• File 3 update -- error, not updated
• File 4 update -- successful
• File 5 update -- successful
Database Design ... Dr. C. Saravanan, NIT Durgapur.
9. Database Systems Logical Architecture
• Main components
• the memory manager - The query processor converts a user query into
instructions the DBMS can process efficiently, taking into account the current
structure of the database
• the query processor - The memory manager obtains data from the database
that satisfies queries compiled by the query processor and manages the
structures that contain data, according to the DDL directives.
• the transaction manager - the transaction manager ensures that the
execution of possibly many transactions on the DBMS satisfies the ACID
“Atomicity, Consistency, Isolation, and Durability“ properties and, also,
provides facilities for recovery from system and media failures.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
11. Three Layers of Data Abstraction
• The physical layer contains specific and detailed information that
describes how data are stored: addresses of various data
components, lengths in bytes, etc.
• The logical layer describes data in a manner that is similar to, say,
definitions of structures in C.
• The user layer contains each user’s perspective of the content of the
database.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
12. Entity–Relationship Model
• (the E/R model) was developed by P. P. Chen
• an important tool for database design
• uses the notions of entity, relationship, and attribute
• entities are objects that need to be represented in the database
• relationships reflect interactions between entities
• attributes are properties of entities and relationships
Database Design ... Dr. C. Saravanan, NIT Durgapur.
13. Example
Database of a College
1. Students: any student who has ever registered at the college;
2. Instructors: anyone who has ever taught at the college;
3. Courses: any course ever taught at the college;
4. Advising: which instructor currently advises which student, and
5. Grades: the grade received by each student in each course, including the
semester and the instructor.
• use the entity/relationship diagram, a graphical representation of the
E/R model, where entity sets are represented by rectangles and sets
of relationships by diamonds.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
15. SETS
• Individual entities and individual relationships are grouped into
• homogeneous sets of entities (STUDENTS, COURSES, and
INSTRUCTORS)
• and homogeneous sets of relationships (ADVISING, GRADES).
• refer such sets as entity sets and relationship sets
Database Design ... Dr. C. Saravanan, NIT Durgapur.
16. • An E/R diagram of a database can be viewed as a graph
• whose vertices are the sets of entities and the sets of relationships.
• An edge may exist only between a set of relationships and a set of
entities.
• Also, every vertex must be joined by at least one edge to some other
vertex of the graph; in other words, this graph must be connected.
• This is an expression of the fact that data contained in a database
have an integrated character.
• This means that various parts of the database are logically related and
data redundancies are minimized.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
17. ROLE
• The notion of role that we are about to introduce helps explain the
significance of entities in relationships.
• Roles appear as labels of the edges of the E/R diagram.
• These role explain which entities are involved in the relationship and
in which capacity: who is graded, who is the instructor who gave the
grade, and in which course was the grade given.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
19. ROLE RELATIONSHIP SET ENTITY SET
ADVISEE ADVISING STUDENT
ADVISOR ADVISING INSTRUCTORS
GRADED GRADES STUDENT
GRADER GRADES INSTRUCTORS
SUBJECT GRADES COURSES
Database Design ... Dr. C. Saravanan, NIT Durgapur.
20. Attributes
• Properties of entities and relationships are described by attributes.
• Each attribute A has an associated set of values, which we refer to as
the domain of A and denote by Dom(A).
• The set of attributes of a set of entities E is denoted by Attr(E).
• The set of attributes of a set of relationships R is denoted by Attr(R).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
21. Example
• The set of entities STUDENTS of the college database has the
attributes
• student identification number (stno),
• student name (name),
• street address (addr),
• city (city),
• state of residence (state),
• zip code (zip).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
22. • The student Edwards P. David, who lives at 10 Red Rd. in Newton, MA,
02129, has been assigned ID number 1011. The value of his attributes
are:
ATTRIBUTE VALUE
STNO 1011
NAME Edwards P. David
ADDR 10 Red Rd.
CITY Newton
STATE MA
ZIP 02129
Database Design ... Dr. C. Saravanan, NIT Durgapur.
23. DOMAINS
• Domains of attributes consist of atomic values.
• This means that the elements of such domains must be “simple”
values such as integers, dates, or strings of characters.
• Domains may not contain such values as sets, trees, relations, or any
other complex objects.
• If e is an entity and A is an attribute of that entity, then we denote by
A(e) the value of the domain of A that the attribute associates with
the entity e.
• Similarly, when r is a relationship, we denote the value associated by
an attribute B to r as B(r).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
24. • For example, if s is a student entity, then the values associated to s
are denoted by
• stno(s), name(s), addr(s), city(s), state(s), zip(s).
• A DBMS must support attribute domains.
• Such support includes validity checks and implementation of
operations specific to the domains.
• For instance, whenever an assignment A(e) = v is made, where e is an
entity and A is an attribute of e, the DBMS should verify whether v
belongs to Dom(A).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
25. • Dom(name) is the set of all possible names for students.
• However, such a definition is clearly impractical for a real database because
it would make the support of such a domain an untenable task.
• Such support would imply that the DBMS must somehow store the list of
all possible names that human beings may adopt.
• Only in this way would it be possible to check the validity of an assignment
of a name.
• Thus, in practice, we define Dom(name) as the set of all strings of length
less or equal to a certain length n. For the sake of this example, we adopt n
= 35.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
26. Entity Set Attribute Domain Description
STUDENTS stno
name
addr
city
state
zip
CHAR(10)
CHAR(35)
CHAR(35)
CHAR(20)
CHAR(2)
CHAR(10)
college-assigned student ID number
full name
street address
home city
home state
home zip
COURSES cno
cname
cr
cap
CHAR(5)
CHAR(30)
SMALLINT
INTEGER
college-assigned course number
course title
number of credits
maximum number of students
INSTRUCTORS empno
name
rank
roomno
telno
CHAR(11)
CHAR(35)
CHAR(12)
INTEGER
CHAR(4)
college-assigned employee ID number
full name
academic rank
office number
office telephone number
Attributes of Sets of Entities
Database Design ... Dr. C. Saravanan, NIT Durgapur.
27. Relationship Set Attribute Domain
GRADES stno
empno
cno
sem
year
grade
CHAR(10)
CHAR(11)
CHAR(5)
CHAR(6)
INTEGER
INTEGER
ADVISING stno
empno
CHAR(10)
CHAR(11)
Attributes of Sets of Relationships
Database Design ... Dr. C. Saravanan, NIT Durgapur.
28. • both STUDENTS and INSTRUCTORS have the attribute name, we use
the qualified attributes STUDENTS.name and INSTRUCTORS.name.
• Attributes of relationships may either be attributes of the entities
they relate, or be new attributes, specific to the relationship.
• For instance, a grade involves a student, a course, and an instructor,
and for these, we use attributes from the participating entities: stno,
cno, and empno, respectively.
• In addition, we need to specify the semester and year when the grade
was given as well as the grade itself. For these, we use new attributes:
sem, year, and grade.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
30. KEYS
• In order to talk about a specific student, you have to be able to
identify him. A common way to do this is to use his name, and
generally, this works reasonably well.
• So, you can ask something like, “Where does Roland Novak live?” In
database terminology, we are using the student’s name as a “key”, an
attribute (or set of attributes) that uniquely identifies each student.
So long as no two students have the same name, you can use the
name attribute as a key.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
31. • What would happen, though, if there were two students named “Helen
Rivers”?
• Then, the question, “Where does Helen Rivers live?” could not be
answered without additional information.
• The name attribute would no longer uniquely identify students, so it could
not be used as a key for STUDENTS.
• Assign a unique identifier (corresponding to the stno attribute) to each
student when he first enrolls.
• This identifier can then be used to specify a student unambiguously; i.e., it
can be used as a key. If one Helen Rivers has ID 6568 and the other has ID
4140.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
32. • Let E be a set of entities having A1, . . . ,An as its attributes. The set
{A1, . . . ,An} is denoted by A1 . . .An.
• Further, if H and L are two sets of attributes, their union is denoted by
concatenation;
• namely, we write HL = A1 . . .AnB1 . . .Bm for H ∪ L if H = A1 . . .An
and L = B1 . . .Bm.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
33. Key Definition
• Let E be a set of entities such that Attr(E) = A1 . . .An.
• A key of E is a nonempty subset L of Attr(E) such that the following
conditions are satisfied:
1. For all entities, e, e′ in E, if A(e) = A(e′) for every attribute A of L, then
e = e′ (the unique identification property of keys).
2. No proper, nonempty subset of L has the unique identification
property (the minimality property of keys).
• Possible to have several keys for a set of entities. One of these keys is
chosen as the primary key; the remaining keys are alternate keys.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
34. Example
• In the college database, the value of the attribute stno is sufficient to
identify a student entity.
• Since the set stno has no proper, nonempty subsets, it clearly satisfies
the minimality condition and, therefore, it is a key for the STUDENTS
entity set.
• For our college, the entity set COURSES both cno and cname are keys.
• Note that this reflects a “business rule”, namely that no two courses
may have the same name, even if they are offered by different
departments.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
36. • May happen grand father and his grand son have same name,
address, city, zip, telno.
• Similarly, twins will have same DOB, but different name.
• Thus, name can not act as a key.
• Where, DOB helps to distinguish, so name and DOB can be a key.
Exercise
How to create a key for loans ?
• A single member borrows the same book repeatedly, thereby creating
several loan relationships, the date attribute is necessary to
distinguish among them.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
37. Foreign Key Definition
• A foreign key for a set of relationships is a set of attributes that is a
primary key of a set of entities that participates in the relationship
set.
How to connect member and loan ?
Database Design ... Dr. C. Saravanan, NIT Durgapur.
38. Constraints
• A student complete at least one course and no more than 45 courses.
• If every student must choose an advisor, and an instructor may not
advise more than 7 students
• If a reader can have no more than 20 books on loan from the town
library
• The second restriction reflects the fact that a book is on loan to at
most one member.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
39. U VR
p:q m:n
Let R be a set of binary relationships involving the sets of entities U and V .
Participation constraints (U, p, q, R) and (V, m, n, R)
The set of relationships R from U to V is:
1. one-to-one if p = 0, q = 1 and m = 0, n = 1;
2. one-to-many if p = 0, q > 1 and m = 0, n = 1;
3. many-to-one if p = 0, q = 1 and m = 0, n > 1;
4. many-to-many if p = 0, q > 1 and m = 0, n > 1.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
40. • Incorporating in the college database information about prerequisites
for courses.
• This can be accomplished by introducing the set of relationships
PREREQ.
• Assume that a course may have up to three prerequisites and place
the appropriate participation constraint, then we obtain the E/R
diagram.
STUDENT PREREQ
Database Design ... Dr. C. Saravanan, NIT Durgapur.
41. WEAK ENTITY TYPES
• expand database by adding information about student loans.
• adding a set of entities called LOANS.
• a student can have several loans MAX=10
STUDENTS LOANSBORROW
RECIPIENT 1:10 AWARD 1:1
Database Design ... Dr. C. Saravanan, NIT Durgapur.
42. • The sets of entities STUDENTS and LOANS are related by the one-to-
many sets of relationships BORROW.
• If a student entity is deleted, the LOANS entities that depend on the
student entity should also be removed.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
43. RELATIONAL MODEL
• Informally, the relational model consists of:
• A class of data structures referred to as tables.
• A collection of methods for building new tables starting from an initial
collection of tables; these methods referred as relational algebra
operations.
• A collection of constraints imposed on the data contained in tables.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
44. Main Data Structure of the Relational Model
• The relational model revolves around a fundamental data structure
called a table.
DOW CNO ROOMNO TIME
MON CS110 84 10AM
TUE CS450 62 12PM
WED CS110 65 10AM
THU CS210 63 3PM
FRI CS310 64 11AM
Database Design ... Dr. C. Saravanan, NIT Durgapur.
45. • the heading of the table, with one entry for each column,
• in the above case dow, cno, roomno, and time and
• the content of the table, i.e., the list of 5 rows specified above.
• The members of the heading are referred to as attributes.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
46. • The heading H of the table consists of the attributes A1, . . . ,An,
• then H is written as a string rather than a set, H = A1 ・ ・ ・An.
• Each attribute A has a special set that is attached to it called the
domain of A that is denoted Dom(A).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
47. • For example, in the table SCHEDULE considered above
• the domain of the attribute dow (for “day of the week”) is the set that
consists of the strings:
’Mon’, ’Tue’, ’Wed’, ’Thu’, ’Fri’, ’Sat’, ’Sun’
• A tuple t of T is called a row of T .
• Set of values that occur under an attribute may be referred to as a
column of T.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
48. • The term “relational model” reflects that fact that, from a
mathematical point of view, the content of a table is what is known in
mathematics as a relation.
• To introduce the notion of relation we need to define the Cartesian
product of sets (sometimes called a cross product ), a fundamental
set operation.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
49. • Let D1, . . . ,Dn be n sets.
• The Cartesian product of the sequence of sets D1, . . . ,Dn is the set
that consists of all sequences of the form
(d1, . . . , dn), where di ∈ Di for 1 ≤ i ≤ n.
• We denote the Cartesian product of D1, . . . ,Dn by D1 × ・ ・ ・ × Dn
• Dom(dow) × Dom(cno) × Dom(roomno) × Dom(time)
• 7 ・ 5 ・ 4 ・ 12 = 1680 quadruples.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
50. Example
• Consider the set D = {1, 2, 3, 4, 5, 6} and the Cartesian product D × D,
which has 36 pairs.
• Certain of these pairs (a, b) have the property that a is less than b
• With a little bit of counting, we see that there are 15 such pairs.
• First is less than the second.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
53. Relational Model
• A table that lists precisely the pairs of D × D that comprise the <
relation.
• It is this correspondence between tables and relations that is at the
heart of the name “relational model.”
Database Design ... Dr. C. Saravanan, NIT Durgapur.
54. Tables
• Informally, the relational model consists of:
• A class of data structures referred to as tables.
• A collection of methods for building new tables starting from an initial
collection of tables; we refer to these methods as relational algebra
operations.
• A collection of constraints imposed on the data contained in tables.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
55. Three main components of table
• The name of the table, in our case SCHEDULE,
• the heading of the table, with one entry for each column, in our case
dow, cno, roomno, and time and
• the content of the table, i.e., the list of 5 rows specified above
Database Design ... Dr. C. Saravanan, NIT Durgapur.
56. Attributes
• The members of the heading are referred to as attributes.
• In keeping with the practice of databases, if the heading H of the
table consists of the attributes A1, . . . ,An,
• then we write H as a string rather than a set, H = A1 ・ ・ ・An.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
57. Domain
• Each attribute A has a special set that is attached to it called the
domain of A that is denoted Dom(A).
• This domain comprises the set of values of the attribute;
• For example, in the table SCHEDULE considered above the domain of
the attribute dow (for “day of the week”) is the set that consists of
the strings:
• ’Mon’, ’Tue’, ’Wed’, ’Thu’, ’Fri’, ’Sat’, ’Sun’
Database Design ... Dr. C. Saravanan, NIT Durgapur.
58. Projections
• For a tuple t of a table T having the heading H we may wish to
consider only some of the attributes of t while ignoring others.
• If L is the set of attributes we are interested in, then t[L] is the
corresponding tuple, referred to as the projection of t on L.
• The projection of the table SCHEDULE on the set of attributes dow
cno is SCHEDULE[dow cno]
Database Design ... Dr. C. Saravanan, NIT Durgapur.
59. Transforming an E/R into a Relational Design
• Assume a set of entities or a set of relationships has a primary key.
• For example, whenever a new patron applies for a card at the library,
the library may assign a new, distinct number to the patron; this set of
numbers could be the primary key for the entity set PATRONS.
• Similarly, each time a book is loaned out, a new loan number could be
assigned, and this set of numbers could be the primary key for the set
of relationships LOANS.
• Note, actually added a new attribute to PATRONS and to LOANS.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
61. • In the E/R model we dealt with two types of basic constituents,
• Entity sets and relationship sets,
• In the relational model, we deal only with tables,
Database Design ... Dr. C. Saravanan, NIT Durgapur.
62. Definition
• Let T be a table that has the heading H. A set of attributes K is a key
for T if K ⊆ H and the following conditions are satisfied:
1. For all tuples u, v of the table, if u[K] = v[K], then u = v (unique
identification property).
2. There is no proper subset L of K that has the unique identification
property (minimality property).
Database Design ... Dr. C. Saravanan, NIT Durgapur.
63. Entity and Referential Integrity
• Student course registrations are recorded in the structure of this database,
a tuple must be inserted into the table GRADES. For example SAT and GRE
• First attribute is applicable to undergraduates and the second can be
applied only to graduate students.
• Null values cannot be allowed to occur as tuple components corresponding
to the attributes of the primary key of a table.
• To define the concept of referential integrity, we need to introduce the
notion of a foreign key.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
64. Metadata
• Metadata is a term that refers to data that describes other data.
• In the context of the relational model, metadata are data that
describe the tables and their attributes.
• The relational model allows a relational database to contain tables
that describe the database itself.
• These tables are known as catalog tables, and they constitute the
data catalog or the data dictionary of the database.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
65. Example, SYSCATALOG
• the attribute owner describes the creator of the table
• The attribute tname gives the name of the table,
• while dbspacename indicates the memory area (also known as the
table space) where the table was placed.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
68. • The attributes cname and tname give the name of the column
(attribute) and the name of table where the attribute occurs.
• The nature of the domain (character or numeric) is given by the
attribute coltype and the size in bytes of the values of the domain is
given by the attribute length.
• The attribute nulls specifies whether or not null values are allowed.
Finally, the attribute in pr key indicates whether the attribute belongs
to the primary key of the table tname.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
69. Data Retrieval
• Tables are more than simply places to store data.
• The real interest in tables is in how they are used.
• To obtain information from a database, a user formulates a question
known as a “query.”
Database Design ... Dr. C. Saravanan, NIT Durgapur.
71. • the intersection R ∩ S,
• the difference R − S,
• the difference S − R,
• and the union R ∪ S
• of the sets R and S.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
72. Selection
• Selection is a unary operation that allows us to select tuples that
satisfy specified conditions.
…STUDENTS where(city = ’Boston’ or city = ’Brookline’
=, !=,<,>,≤, or ≥
Database Design ... Dr. C. Saravanan, NIT Durgapur.
74. The Join Operation
• The join operation is important for answering queries that combine
data that reside in several tables.
• Let T1, T2 be two tables that have the headings
A1 ・ ・ ・Am B1 ・ ・ ・Bn and B1 ・ ・ ・Bn C1 ・ ・ ・Cp,
• the two tables that have only the attributes B1, . . . ,Bn in common.
• The tuples t1 in T1 and t2 in T2 are joinable if
t1[B1 ・ ・ ・Bn] = t2[B1 ・ ・ ・Bn].
Database Design ... Dr. C. Saravanan, NIT Durgapur.
75. • The join of t1 and t2 is denoted by
• if D is one of the attributes B1, . . . ,Bn shared by the two tables,
• then t1[D] = t2[D]
Database Design ... Dr. C. Saravanan, NIT Durgapur.
77. • The tuples t1 and u1 are joinable because t1[BD] = u1[BD] = (b1 d1);
similarly,
• t2 is joinable with u2,
• t3 is joinable with u1, and
• t4 and t5 are not joinable with any tuple of S.
• Because (b1 d2) != ? And (b3, d3) != ?
Database Design ... Dr. C. Saravanan, NIT Durgapur.
79. Example
• Finding the names of all instructors who have taught cs110.
• Extract all grade records involving cs110
• by joining with INSTRUCTORS extract the records of instructors who
teach this course
• a projection on name yields the answer to the query
Database Design ... Dr. C. Saravanan, NIT Durgapur.
80. • T1 := (GRADES wherecno = ’cs110’).
• T2 := (T1 INSTRUCTORS).
• ANS := T2[name].
Database Design ... Dr. C. Saravanan, NIT Durgapur.
81. Division
• Let T1, T2 be two tables such that
• the heading of T1 is A1 . . .AnB1 . . .Bk and
• the heading of T2 is B1 . . .Bk
• The table obtained by division of T1 by T2 is the table T1 ÷ T2 that has
the heading A1 . . .An and contains those tuples t in tuple (A1 . . .An)
Database Design ... Dr. C. Saravanan, NIT Durgapur.
82. The Basic Operations of Relational Algebra
• Discussed nine operations: renaming, union, intersection, difference,
product, selection, projection, join, and division.
• unary operations of relational algebra — selection and projection —
have higher priority than the remaining binary operations.
• Let T1 and T2 be two compatible tables.
• It is easy to see that T1∩T2 has the same content as T1 − (T1 − T2).
• Thus intersection can be accomplished using difference.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
83. • Join operation can be expressed using the operations of renaming,
product, selection, and projection consider the following example.
• The tables T1, T2 introduced in Example 4.1.20, have the headings
ABD and BCD, respectively. The table T3 := T1 × T2 is
• Then, we eliminate duplicate columns and rename the attributes in
T4(A,B,D,C) := T3[T1.A, T1.B, T2.C, T2.D].
Database Design ... Dr. C. Saravanan, NIT Durgapur.
85. Other Relational Algebra Operations
• Let T and T ′ be two tables such that their headings H, H′, respectively,
have no common attributes.
• Suppose that A1, . . . ,An are attributes of H and B1, . . . ,Bn are
attributes of H′
• such that DomAi = DomBi for 1 ≤ i ≤ n, and let θi be one of {=, ! =,<
,≤,>,≥} for 1 ≤ i ≤ n.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
86. Example
• To determine the pairs of student names and instructor names such
that the instructor is not an advisor for the student. In order to deal
with the requirement that the tables involved in a θ-join have disjoint
headings we create the tables:
• ADVISING1(stno, empno1) := ADVISING,
and
• INSTRUCTORS1(empno,name1) := INSTRUCTORS[empno,name].
Database Design ... Dr. C. Saravanan, NIT Durgapur.
87. Semi join and Left outer join
• Let T1, T2 be two tables having the headings H1,H2 and the contents ρ1,
ρ2, respectively.
• Their semijoin is the table named T1⋉T2 that has the heading H1 and the
content ρ1⋉ρ2, where ρ1⋉ρ2 = (ρ1 ⋉ ρ2)[H1].
• The left outer join of T1 and T2 is the table named T1 ⋉ T2 having the
heading H1 ∪ H2 and the content ρ1 ⋉ℓ ρ2, where:
• ρ1 ⋉ℓ ρ2 = (ρ1 ⋉ ρ2) ∪ {(a1, . . . , an, null, . . . , null) |
• (a1, . . . , an) ∈ ρ1 − (ρ1⋉ρ2)}.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
88. Right outer join
• The right outer join of T1 and T2 is the table named T1 ⋉r T2 whose
heading is H1 ∪ H2, having the content ρ1 ⋉r ρ2, where
• ρ1 ⋉r ρ2 = (ρ1 ⋉ ρ2) ∪ {(null, . . . , null, b1, . . . , bp) | (b1, . . . , bp) ∈
ρ2 − (ρ2⋉ρ1)}
Database Design ... Dr. C. Saravanan, NIT Durgapur.
89. Relational Model Concepts
• The relational model used the basic concept of a relation or table.
• The columns or fields in the table identify the attributes such as
name, age, and so.
• A tuple or row contains all the data of a single instance of the table
such as a person named Doug.
• In the relational model, every tuple must have a unique identification
or key based on the data.
• The relational model also includes concepts such as foreign keys,
which are primary keys in one relation that re kept in another relation
to allow for the joining of data.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
90. Data Definition Language
• Define or restructure the database.
• ALTER statements modify the definition of existing entities.
• For example, use ALTER TABLE to add a new column to a table,
• or use ALTER DATABASE to set database options.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
91. • CREATE statements define new entities.
• For example, use CREATE TABLE to add a new table to a database.
• DISABLE TRIGGER disables a trigger.
• DROP statements remove existing entities.
• For example, use DROP TABLE to remove a table from a database.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
92. • ENABLE TRIGGER enables a DML or DDL trigger.
• TRUNCATE TABLE removes all rows from a table without logging the
individual row deletions .
• UPDATE STATISTICS updates query optimization statistics on a table or
indexed view.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
93. Database Views
• A Database View is a subset of the database sorted and displayed in a
particular way.
• For example, in an equipment database, perhaps you only wish to
display the Weapons stored in the database.
• To do that you would create a Weapons view.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
94. Database Index
• Indexes are used to quickly locate data without having to search every
row in a database table every time a database table is accessed.
• Indexes can be created using one or more columns of a database
table.
• An index is a copy of select columns of data from a table that can be
searched very efficiently that also includes a low-level disk block
address or direct link to the complete row of data it was copied from.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
95. Types of Indexes
• Bitmap index - stores the bulk of its data as bit arrays (bitmaps) and
answers most queries by performing bitwise logical operations on these
bitmaps.
• Dense index - a file with pairs of keys and pointers for every record in the
data file
• Sparse index - a file with pairs of keys and pointers for every block in the
data file
• Reverse index - reverses the key value before entering it in the index.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
96. Normalization
• Remove redundant data
• Protect the relational model
• Improve scalability and flexibility
• First normal form (1NF) – no two rows of data repeating information
– each row should have a primary key or concatenated key.
• Second normal form (2NF) – there must not be any partial
dependency of any column on primary key or concatenated key.
• Third normal form (3NF) – every non-prime attribute table must be
dependent on primary key or concatenated key.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
97. • Boyce–Codd normal form (BCNF) - Every non-trivial functional
dependency in the table is a dependency on a superkey.
• Fourth normal form (4NF) - for every one of its non-trivial multivalued
dependencies is a superkey.
• Fifth normal form (5NF) / Project-Join normal form (PJ/NF) - every
non-trivial join dependency in it is implied by the candidate keys.
• Sixth normal form (6NF) - no nontrivial join dependencies at all.
• Inclusion Dependency Normal Form (IDNF) - a relation in BCNF also is
noncircular and key-based.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
98. Secondary Storage Devices
• Need of secondary storage devices
• Storing Files
• Huge number of files
• Huge size of files
• Text files
• Images
• Videos
• Etc.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
99. Primary Vs. Secondary
• Primary
• Volatile
• Temporary
• Fast
• Secondary
• Non-volatile
• Permanent
• Slow
Database Design ... Dr. C. Saravanan, NIT Durgapur.
100. Secondary Storage Device Types
• Technology used to store data
• Capacity of data they can hold
• Size of storage device
• Portability of storage device and
• Access time to stored data.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
101. • Integral
• Internal
• External
• Hard disks
• Optical Disks
• Magnetic Tapes
• Solid State Devices
Database Design ... Dr. C. Saravanan, NIT Durgapur.
102. Hard disks
• IDE (Integrated Drive Electronics)
• SCSI (Small Computer System Interface)
• SATA (Serial Advanced Technology Attachment)
• PATA (Parallel Advanced Technology Attachment)
• SAS (Serial Attached SCSI)
• FC (Fibre Channel)
• S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology)
• RAID (Redundant Array of Inexpensive Disks)
Database Design ... Dr. C. Saravanan, NIT Durgapur.
105. Optical Disks
• First, Second, Third, Fourth Generation
• CD
• DVD
• HD-DVD
• Blu-Ray
• Archival Disc - able to withstand temperature, humidity, dust and water,
ensuring that the disc is readable for at least 50 years
• Holographic Versatile Disc - store up to several terabytes
• LS-R (Layer-Selection-Type Recordable Optical Disk)
• Protein-coated disc - 50 Terabytes on one disc
Database Design ... Dr. C. Saravanan, NIT Durgapur.
107. Buffering of Blocks
• Several buffers can be reserved in main memory.
• While one buffer is being read or written, the CPU can
process data in the other buffers.
• concurrency: interleaved or in parallel
Database Design ... Dr. C. Saravanan, NIT Durgapur.
time
A A
B B
C
D
108. Double Buffering
• The CPU can start processing a block once its transfer to main
memory is completed; at the same time the disk I/O processor can be
reading and transferring the next block into a different buffer.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
i
fill A
i+1
fill B
i+2
fill A
i+3
fill B
i+4
fill A
i
process A
i+1
process B
i+2
process A
i+3
process B
i+4
process A
time
Disk block:
I/O:
Disk block:
Processing:
109. File Organisations
• Fixed-length records and variable-length records
• Fixed-length records:
Database Design ... Dr. C. Saravanan, NIT Durgapur.
Name SSN Salary
JobCode
Department Hire-Date
1 30 40 44 48 68 71
Variable-length records:
Smith, John 123456789 xxxx xxxx Computer
name SSN salary
JobCode
department
1 12 21 25 29 37
NAME=Smith, John SSN=123456789 DEPATMENT=Computer
110. Allocating file blocks on disk
• Contiguous allocation:
• Disk track
• Linked allocation:
Database Design ... Dr. C. Saravanan, NIT Durgapur.
file block 1 file block 2 … ...
file block 1 file block 2
file block 3 … ...
111. • File header: disk addresses of blocks, record format description (field
length, order of fields, field type code, separator characters, record
type code, …)
• Operations on Files
Database Design ... Dr. C. Saravanan, NIT Durgapur.
operations
general OP.
retrieval OP.
update OP.
combined OP.
record-at-a-time
set-at-a-time
112. Heaps
• A heap is a specialized tree-based data structure that satisfies
the heap property
• Heaps can then be classified further as either "max heap" and "min
heap“
• In a max heap, the keys of parent nodes are always greater than or
equal to those of the children and the highest key is in the root node.
• In a min heap, the keys of parent nodes are less than or equal to
those of the children and the lowest key is in the root node.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
113. Hashing
• Hashing is a method of storing records according to their key values.
• It provides access to stored records in constant time, O(1), so it is
comparable to B-trees in searching speed.
• Therefore, hash tables are used for:
a) Storing a file record by record.
b) Searching for records with certain key values.
• In hash tables, the main idea is to distribute the records uniquely on a
table, according to their key values.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
114. • Take the key and use a function to map the key into one location of
the array:
• f(key)=h,
• where h is the hash address of that record in the hash table.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
115. Overflow Handling Techniques
• Linear probing
• Random probing
• Chaining
• Chaining with overflow
• Rehashing
Database Design ... Dr. C. Saravanan, NIT Durgapur.
116. Dynamic Hashing
• As the database grows over time, we have three options:
• Choose hash function based on current file size. Get performance
degradation as file grows.
• Choose hash function based on anticipated file size. Space is wasted initially.
• Periodically re-organize hash structure as file grows. Requires selecting new
hash function, recomputing all addresses and generating new bucket
assignments. Costly, and shuts down database.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
117. • Some hashing techniques allow the hash function to be modified
dynamically to accommodate the growth or shrinking of the
database. These are called dynamic hash functions.
• Extendable hashing is one form of dynamic hashing.
• Extendable hashing splits and coalesces buckets as database size changes.
• This imposes some performance overhead, but space efficiency is maintained.
• As reorganization is on one bucket at a time, overhead is acceptably low.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
118. Primary and Secondary Index
• Primary index
A primary index is an index on a set of fields that includes the unique
primary key for the field and is guaranteed not to contain duplicates.
• Also Called a **Clustered index**.
• eg. Employee ID can be Example of it.
• Secondary index
A Secondary index is an index that is not a primary index and may
have duplicates.
• eg. Employee name can be example of it. Because Employee name
can have similar values.
Database Design ... Dr. C. Saravanan, NIT Durgapur.
119. Clustered and non-clustered index
• A clustered index determines the order in which the rows of a table
are stored on disk.
• If a table has a clustered index, then the rows of that table will be
stored on disk in the same exact order as the clustered index.
• A non-clustered index will store both the value of the
EmployeeID AND a pointer to the row in the Employee table where
that value is actually stored.
Database Design ... Dr. C. Saravanan, NIT Durgapur.