SlideShare a Scribd company logo
1 of 183
INTRODUCTION TO
DATABASE
1
A
Presentation on
Prepared by:
Jyoti Giri
Assistant Professor
GDRCST, Bhilai
What is data?
2
 A collection of facts from which conclusion may be
drawn such as “statistical data”.
 Data is the plural form of datum.
 It is a representation of facts or concepts in an
organized manner in order that it may be stored,
communicated, interpreted or processed by
automated means.
Example: researchers who conduct market research
survey might ask member of the public to
complete questionnaires about a product or a
service. These completed questionnaires are data;
they are processed and analyze in order to
prepare a report on the survey.
Properties of Data (In database)
3
 Data should be well organized.
 Data should be related.
 Data should be accessible in any order.
 One data should be stored minimum number
of times.
What is a Database?
4
 Database is a collection of related data, that
contains information relevant to an
enterprise.
 For example:
1. University database
2. Employee database
3. Student database
4. Airlines database
etc…..
PROPERTIES OF A
DATABASE5
 A database represents some aspect of the real
world, sometimes called the miniworld or the
universe of discourse (UoD).
 A database is a logically coherent collection of
data with some inherent meaning.
 A database is designed, built and populated
with data for a specific purpose.
What is Database Management
System (DBMS)?6
 A database management system (DBMS) is a
collection of programs that enables users to create &
maintain a database. It facilitates the definition,
creation and manipulation of the database.
 Definition – it holds only structure of database, not
the data. It involves specifying the data types,
structures & constraints for the data to be stored in the
database.
 Creation –it is the inputting of actual data in the
database. It involves storing the data itself on some
storage medium that is controlled by the DBMS.
 Manipulation-it includes functions such as updation,
insertion, deletion, retrieval of specific data and
generating reports from the data.
Typical DBMS Functionality
7
 Define a database : in terms of data types,
structures and constraints
 Construct or Load the Database on a
secondary storage medium
 Manipulating the database : querying,
generating reports, insertions, deletions and
modifications to its content
 Concurrent Processing and Sharing by a set of
users and programs – yet, keeping all data
valid and consistent
Typical DBMS Functionality
8
Other features:
 Protection or Security measures to prevent
unauthorized access
 “Active” processing to take internal actions on
data
 Presentation and Visualization of data
Database System
9
 The database and the DBMS together is
called the database system.
 Database systems are designed to manage
large bodies of information.
 It involves both defining structures for storage
of information & providing mechanisms for the
manipulation of information.
 Database system must ensure the safety of the
information stored.
A simplified database system
environment10
Database System Applications
11
 Banking- for customer information, accounts & loans, and
banking transactions.
 Airlines-for reservations & schedule information.
 Universities-for student information, course registration and
grades.
 Credit card transactions-for purchases on credit cards &
generation of monthly statements.
 Telecommunication-for keeping records of calls made,
generating monthly bills, maintaining balances, information
about communication networks.
 Finance-for storing information about holdings, sales &
purchases of financial instruments such as stocks & bonds.
 Sales-for customer, product and purchase information.
 Manufacturing-for management of supply chain & for
tracking production of items in factories.
 Human resources-for information about employees, salaries,
payroll taxes and benefits
Traditional File systems
12
 Before the evolution of DBMS, organizations used
to store information in file systems.
 A typical file processing system is supported by a
conventional operating system.
 The system stores permanent records in various
files & it need application program to extract
records , or to add or delete records .
 In traditional file processing, each user defines and
implements the files needed for a specific
application.
Traditional file system
13
 For example, one user, the grade reporting office, may keep a
file on students and their grades. Programs to print a student’s
transcript and to enter new grades into the file are
implemented.
 A second user, the accounting office, may keep track of
students’ fees and their payments.
 Although both users are interested in data about students, each
user maintains separate files—and programs to manipulate
these files—because each requires some data not available
from the other user’s files.
 This redundancy in defining and storing data results in wasted
storage space and in redundant efforts to maintain common
data up-to-date.
Disadvantages of File systems
14
1.Data Redundancy & Inconsistency
2.Difficulty in Accessing data
3.Data Isolation
4.Integrity Problems
5.Atomicity Problems
6.Concurrent access Anomalies or Problems
7.Security Problems
Data Redundancy & Inconsistency
15
 Different programmers work on a single project , so various files are
created by different programmers at some interval of time.
 So various files are created in different formats & different programs
are written in different programming language.
 Same information is repeated.
 For example: name & address may appear in saving account file as
well as in salary account.
 This redundancy results in higher storage space & access cost.
 It also leads to data inconsistency which means that if we change
some record in one place the change will not be reflected in all the
places.
 For ex. a changed customer address may be reflected in saving
record but not any where else.
16
 Accessing data from a list is also a difficulty in file
system.
 Suppose we want to see the records of all
customers who has a balance less than Rs10,000,
we can either check the list & find the names
manually or write an application program.
 If we write an application program & at some later
time, we need to see the records of customer who
have a balance of less than Rs 20,000, then again
a new program has to be written.
 It means that file processing system do not allow
data to be accessed in a convenient manner.
Difficulty in Accessing data
17
 As the data is stored in various files, & various
files may be stored in different format, writing
application program to retrieve the data is
difficult.
Data Isolation
Integrity Problems
18
 We need that data stored should satisfy certain
constraints as in a bank a minimum deposit
should be of 1000 Rs.
 Developers enforce these constraints by
writing appropriate programs but if later on
some new constraint has to be added then it is
difficult to change the programs to enforce
them.
19
 Any mechanical or electrical device is subject to
failure, and so is the computer system.
 In this case we have to ensure that data should be
restored to a consistent state.
 For example an amount of Rs 50 has to be transferred
from Account A to Account B.
 Let the amount has been debited from account A but
have not been credited to Account B and in the mean
time, some failure occurred.
 So, it will lead to an inconsistent state.
 So, we have to adopt a mechanism which ensures that
either full transaction should be executed or no
transaction should be executed i.e. the fund transfer
should be atomic.
Atomicity Problems
20
 Many systems allows multiple users to update
the data simultaneously.
 It can also lead the data in an inconsistent
state.
 Suppose a bank account contains a balance of
Rs 500 & two customers want to withdraw
Rs100 & Rs 50 simultaneously.
 Both the transaction reads the old balance &
withdraw from that old balance which will result
in Rs 450 , Rs 400 which is incorrect.
Concurrent access Problems
21
 All the user of database should not be able to
access all the data.
 For example a payroll Personnel needs to
access only that part of data which has
information about various employees & are not
needed to access information about customer
accounts.
Security Problems
Advantages of DBMS
22
 Controlling Redundancy
 Restricting Unauthorized Access
 Providing Storage Structures for Efficient
Query Processing
 Providing Backup and Recovery
 Providing Multiple User Interfaces
 Representing Complex Relationship among
Data
 Enforcing Integrity Constraints
 Permitting Inferencing and Actions using Rules
Disadvantages of DBMS
23
 Cost of Hardware & Software
 Cost of Data Conversion
 Cost of Staff Training
 Appointing Technical Staff
 Database Damage
Users may be divided into
24
 Those who actually use and control the
content (called “Actors on the Scene”)
 those who enable the database to be
developed and the DBMS software to be
designed and implemented (called “Workers
Behind the Scene”).
Actors on the scene
25
 Database administrators
 Database Designers
 End-users
Database administrators (DBA)
26
 Database administrators is the controller of the
overall operations of the database.
 But he is not responsible for creating the database
or the structure of the database.
 Database administrators is the most powerful actor
on the scene.
Functions of DBA
27
 Authorizing access to the database
 Coordinating & monitoring the database
 For acquiring hardware & software resources
as needed by the user
 Concurrency control checking
 Security of the database
 Making backups & recovery
 Modification of the database structure & its
relation to the physical database
Database Designers (DBD)
28
 Database Designers is the person who designs the
database structure for the first time pre-requisites
i.e. to collect data from which source is decided by
DBD.
Functions of DBD
29
 the creation of original description of the
database structure
 database designers interact with different
group of users & integrate their views to make
the best structure.
End-users
30
They use the data for queries, reports and some of
them actually update the database content.
Types of end-users
 Casual
 Naïve or Parametric
 Sophisticated
 Specialized or Stand-alone
31
 Casual: they can only browse through the
database; they cannot create, update or make
any changes in the database.
 Naïve or Parametric: they use the readymade
software which deals with the database. They
can only update the database. Examples are
bank-tellers or reservation clerks who do this
activity for an entire shift of operations.
32
 Sophisticated: these include business
analysts, scientists, engineers, others
thoroughly familiar with the system capabilities.
Many use tools in the form of software
packages that work closely with the stored
database.
 Stand-alone: mostly maintain personal
databases using ready-to-use packaged
applications. An example is a tax program user
that creates his or her own internal database.
Workers behind the scene
33
 DBMS system implementers: they are the creators of the
DBMS.
 Tools Developers: tools are the facilities provided to help the
DBMS or the user. They are packages for database design,
performance monitoring, graphical interfaces, and simulation
package. Tool developers, develop the tools for DBMS.
 Operators & maintenance personnel: these are the
workers/persons required for maintaining the hardware or
software of the DBMS.
DATA MODEL
34
 A data model is a collection of concepts that can be
used to describe the structure of a database.
 By structure of a database we mean the
 Data types,
 Relationships,
 Constraints that should hold on the data.
Categories of data models
35
 Conceptual (high-level, semantic) data models
 Physical (low-level, internal) data models
 Implementation (representational, record based)
data models
Conceptual data models
36
Conceptual data models
37
 Before implementation, a rough model of database
is created.
 This model is never implemented but is used for
designing purpose.
 Also called entity-based or object-based data
models.
Example: E-R Model
E-R model
38
 Stands for entity-relationship model.
Terms used in E-R model:
Field – Attribute
Record – Entity
File – Entity Type
E-R Model
39
Entity – It is an object with a physical existence.
Ex: An object with a physical existence – a
person, a car, a house or it may be an object
with conceptual existence – a company, a job
or a university.
Attribute – Attributes are the particular properties
that describe an entity.
Ex: A STUDENT entity may be described by
student’s name, age, address, class, grade.
EXAMPLE
40
Quic kTime™ and a
dec ompress or
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Physical data models
41
Physical data models
42
 It provides concepts that describe the details
of how data is stored in the computer.
 Concepts provided by physical data models
are generally meant for computer specialists,
not for typical end users.
Implementation data models
43
Implementation data models
44
 Provide concepts that fall between the above
two.
 It also provides concepts that may be
understood by end users but that are not too
far removed from the way data is organized
within the computer.
 Example: relational model, network model,
hierarchical model.
Relational Model
The relational model uses a collection of tables to represent both
data and the relationships among those data.
Cust_id Cust_Name Cust_Add Cust_City
1000 Ajay Kohka Bhilai
1001 Vishal Shanti
Nagar
Nagpur
Acc_No Bal
A-101 5000
A-102 10000
45
Cust_id Acc_No
1000 A-101
1001 A-102
Customer Table Account Table
Depositer Table
Hierarchical Model
46
 Hierarchical data model uses tree structures to represent
relationship among records.
 Trees structures occur naturally in many data
organizations because some entities have an intrinsic
hierarchical order .
Institute ->Programs->courses->Students
Network Model
47
 This model uses two different data structures to
represent the database entities and
relationships between the entities, namely
record type & Set type
 A record type is used to represent an entity
type . It is made up of a number of data items
that represent the attributes of the entity.
 A set type is used to represent a directed
relationship between two record types called
owner record type & member record type.
48
Record Type (Department & Employee)
Set Type (Dept - Emp) with department as the
owner record type & employee as the member
record type.
Example
SCHEMAS AND INSTANCES
49
 The description of a database is called the
database schema, which is specified during
database design and is not expected to change
frequently.
 The collection of information stored in the
database at a particular moment is called an
instance of the database. It changes very
frequently than the schema.
Schema and Instance
50
Student(studno,name,address)
Course(courseno,lecturer)
Student(123,Bloggs,Woolton)
(321,Jones,Owens)
SCHEMA
INSTANCE
View of Data
51
 A major purpose of a database system is to
provide users with an abstract view of the data.
 That is, the system hides certain details of how
the data are stored and maintained.
 So the method of hiding the actual (complex)
details from users is called as the levels of data
abstraction.
Levels of data abstraction
52
Physical level
53
 It is the lowest level of abstraction & specifies how the data is actually
stored.
Example:
A banking enterprise may have several such record types, including
 Customer, with customer-id, customer-name, customer-street,
customer-city
 account, with fields account-number and balance
 employee, with fields employee-name and salary
 At the physical level, a customer, account, or employee record can be
described as a block of consecutive storage locations (for example,
words or bytes). The language compiler hides this level of detail from
programmers. Similarly, the database system hides many of the
lowest-level storage details from database programmers.
Logical level
54
 It is the next level of abstraction & describes
what data are stored in database & what
relationship exists between various data.
Example :
 At the logical level, each such record is
described by a type definition, and the
interrelationship of these record types is defined
as well.
 Programmers using a programming language
work at this level of abstraction.
55
• This level contains the actual data which is shown to
the users.
• This is the highest level of abstraction & the user of this
level need not know the actual details of data storage.
Example:
 At the view level, several views of the database are
defined, and database users see these views. For
example, tellers in a bank see only that part of the
database that has information on customer accounts;
they cannot access information about salaries of
employees.
View level
ANSI-SPARC 3-level DBMS
Architecture56
Three-schema architecture
57
 The three-schema architecture is a convenient
tool for the user to visualize the schema levels
in a database system.
 In this architecture, schemas can be defined at
the following three levels:
 Internal schema/Physical schema
 Conceptual schema
 External schema
58
 The internal level has an internal schema, which describes
the physical storage structure of the database.
 The conceptual level has a conceptual schema, which
describes the structure of the whole database for a community
of users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data
types, relationships, user operations, and constraints.
 The external or view level includes a number of external
schemas or user views.
 The processes of transforming requests and results between
levels are called mappings.
Example: university database
59
Conceptual schema:
 Student (sid: string, name: string, age: number, percent: real)
 Courses (cid: string, cname: string, credits: number)
 Enrolled (sid: string, cid: string, grade: string)
Physical schema:
 Relations stored as unordered files.
 Index on first column of students
External schema:
 Course_info(cid: string, enrollment: integer)
DATA INDEPENDENCE
60
 The changes can be made in one level without
affecting the other levels that is called data
independence.
 Data independence is the capacity to change
the schema at one level of a database system
without having to change the schema at the
next higher level.
Types of data independence
61
 Logical data independence is the capacity to
change the conceptual schema without having
to change external schemas or application
programs.
 Physical data independence is the capacity
to change the internal schema without having
to change the conceptual (or external)
schemas.
DBMS Structure
62
Telecomm System
Compiled User
Interface
Compiled
Application Prog.
Batch User
Naive User Casual User DBA
Telecomm System Telecomm System
Query Processor
DBMS & its Data
Manager
OS or Own File
Manager
OS Disk Manager
Data Files &
Data Dictionary
DDL Compiler
63
 DDL Compiler (Data definition Language Compiler): the DDL
compiler converts the data definition statements into a set of tables.
These tables contain the metadata concerning the database & are in
a form that can be used by other components of the DBMS.
 Data Manager: the data manager is the central software component
of the DBMS. It is sometimes referred to as the database control
system.
Functions:
 Converts operations in the user’s queries coming directly via the
query processor or indirectly via an application program from the
user’s logical view to a physical file system.
 Responsible for interfacing with the file system.
 Tasks of enforcing constraints to maintain the consistency, integrity &
security of the data.
 Synchronizing the simultaneous operations performed by concurrent
users.
 Entrusting backups & recovery operations.
64
File Manager:
 Responsibility for the structure of the files & managing
the file space rests with the file manager.
 Responsible for locating the block containing the
required record, requesting this block from the disk
manager & transmitting the required record to the data
manager.
Disk Manager:
 The disk manager is part of the operating system & all
physical input & output operations are performed by it.
 Transfers the block or page requested by the file
manager.
65
Query Processor: The query processor is used to interpret the online user’s
query & convert it into an efficient series of operations in a form capable of
being sent to the data manger for execution.
Functions:
 The data manipulation statements are compiled separately into a sequence
of optimized operations on the database.
 Transfers data to & form a work-area indicated in a subroutine call & control
returns to the applications programs.
 During execution, when a subroutine call inserted in place of the data
manipulation statements, control transfers to the run-time system. This
system in turn transfers control to the compiled version of the original data
manipulation statements. These data manipulation are executed by the data
manager.
 A user action that requires a database operation causes the application
program to request the service via its run time system & data manager.
 Batch users of the database also interact with the database via their
application program, its run-time system & data manager.
66
Telecommunication System:
 it is a software system used to communicate a
remote or local computer by sending or
receiving messages over communication lines.
 Messages from the user are routed by the
telecommunication system to the appropriate
target & responses are sent back to the user.
67
Data Files: data files contain the data portion of the database.
Data dictionary:
 Information pertaining to the structure & usage of data
contained in the database, the metadata, is maintained in a
data dictionary.
 It stores information concerning the external, conceptual &
internal levels of the database.
 It contains the source of each data-field value, the frequency
of its use & details concerning updates.
 The data dictionary itself is a database, documents the data.
Entity- Relationship Model
68
 The E-R model is the most commonly used
conceptual model.
 In this model, the real world consists of a
collection of basic objects called entities and
the relationships among these objects.
 The end product of the modeling process is an
entity-relationship diagram (ERD) or ER
diagram.
 This is very important conceptual data model.
 But it is not implemented but design for
creating the database.
The E-R data model employs three
basic notions:69
 Entity
 Attributes
 Relationship
Entity70
 It is an object with a physical existence.
 For example, each person in an enterprise ,
car, house, a company, or a university course.
Entity Type & Entity Sets
71
 Entity Type –
 collection of entities that have the same attributes.
Ex: STUDENT, UNIVERSITY
 Entity Set –
 The collection of all entities of a particular entity type.
Ex: Set of all rows
10 rows of STUDENT
Name Age Rollno
STUDENT
Graphical representation of
entity sets72
Attributes
73
 Attributes are the particular properties that
describe an entity.
Ex: A STUDENT entity may be described by
student’s name, student’s roll_number.
Graphical representation of
attributes74
Types of Attributes
75
 Simple (Atomic) and Composite Attributes
 Single Valued & Multi-valued Attributes
 Stored and Derived Attributes
 Null Valued Attributes
 Complex Attributes
Simple (Atomic) and Composite
Attributes76
 Simple attributes are not divisible into parts.
For example, EmployeeNumber and Age.
 Composite attributes can be divided into
smaller subparts. These subparts represent
basic attributes with independent meanings of
their own. For example, take Name and
address attributes.
77
Address
Street Address city state Pin
number
street apartment no.
Single Valued & Multi-valued
Attributes78
 Single-valued attributes have a single value for
particular entity. Example: Roll_no, Age.
 Multi-valued attributes may have more than
one value for a single entity. Example:
Phone_no
Stored and Derived Attributes
79
 Derived attribute is not stored in the database
but it is derived from some attributes.
 Example: If DOB is stored in the database
then we can calculate age of a student by
subtracting DOB from current date.
 Hence, in this case DOB is the stored attribute
and age is considered as derived.
Null Valued Attributes
80
 Null value is a value which is not inserted but it
does not hold zero value.
 The attributes which can have a null value
called null valued attributes.
 Example: Mobile_no attributes of a person
may not be having mobile phones.
Complex Attributes
81
 Complex attribute is a combination of
composite and multi-valued attributes.
Complex attributes are represented by { } and
composite attributes are represented by ( ).
 Example: Address_phone attribute will hold
both the address and phone_no of any person.
 Example: {(2-A, St-5, Sec-4, Bhilai), 2398124}
Key attribute in an entity type
82
 Key attributes will be having a unique value for
each entity of that attribute.
 It identifies every entity in the entity set.
 Key attribute will never be a null valued
attribute.
 Any composite attribute can also be a key
attribute.
 There could be more than one key attributes
for an entity type.
Example: roll_no, enrollment _no
Domain of value set of an
attribute83
 Domain of an attribute is the allowed set of
values of that attribute.
Example: if attribute is ‘grade’, then its allowed
values are A,B,C,F.
 Grade ={A, B,C,F}
TYPES OF ENTITY TYPES
Strong entity type – Entity types that have at least one
key attribute.
Weak entity type – Entity type that does not have any
key attribute.
An entity in a weak entity type is identified by a
relationship with a strong entity type and that
relationship is called Identifying Relationship and that
strong entity type is called the owner of the weak entity
type.
84
TYPES OF ENTITY TYPES
85
Roll No. Name Age
1 Rakesh 20
2 Nikhil 21
3 Nikhil 21
Name M1 M2 M3
Nikhil 50 45 40
Nikhil 80 75 82
Student
Marks
Secured
Identifying
Relationship
Relationship
86
 Relates two or more distinct entities with a specific
meaning.
 For example, EMPLOYEE John works on the
ProductX PROJECT
or
 EMPLOYEE Franklin manages the Research
DEPARTMENT.
Terms used:
Relationship type,
Relationship set,
Relationship instances.
87
BACK
88
Relationship type: secured
Relationship set: {R1, R2, R3, R4}
Relationship instances: R1
Graphical Representation of
Relationship Sets89
NOTATIONS USED IN E-R
DIAGRAM
90
Entity Type
Attribute
Key Attribute
Weak Entity Type
NOTATIONS USED IN E-R
DIAGRAM
91
Composite Attribute
Derived Attribute
Multivalued Attribute
NOTATIONS USED IN E-R
DIAGRAM
92
Identifying Relationship
Relationship Type
Constraints
93
Relationship types usually have certain
constraints. Two main types of relationship
constraints:
 Mapping cardinalities
 Participation constraints
Mapping cardinalities, or cardinality
ratios94
 Specifies the number of relationship instances that
an entity can participate in.
 For example, in the WORKS_FOR relationship
type.
Mapping Cardinalities
95
 One-to-one (1:1)
 One-to-many (1: N)
 Many-to-one (N: 1)
 Many-to-many (M: N)
(a) One-to-one (b) One-to-many
96
(a) Many-to-one (b) Many-to-
many97
Example of E-R Diagrams
98
 Rectangles represent entity types.
 Diamonds represent relationship types.
 Lines link attributes to entity types and entity types to relationship types.
 Ellipses represent attributes
 Underline indicates primary key attributes (will study later)
E-R Diagram With Composite, Multivalued, and Derived
Attributes
99
Relationship Types with
Attributes100
we have the access_date attribute attached to the relationship set
depositor to specify the most recent date on which a customer
accessed that account.
Cardinality ratio
101
 We express cardinality ratio by drawing
 directed line (→), signifying “one,” or an
 undirected line (—), signifying “many,”
One-To-One Relationship
102
One-To-Many Relationship
103
 In the one-to-many relationship a customer is associated
with several loans via borrower
Many-To-One Relationships
104
 In a many-to-one relationship a loan is associated with
several customers via borrower.
Many-To-Many Relationship
105
Find out the Cardinality ratio
106
 Prime minister-country
 classroom –students
 students –classroom
 customer -loan
Participation constraints
107
 Total participation : every entity in the entity type participates in at
least one relationship in the relationship type
 E.g. participation of loan in borrower is total
 every loan must have a customer associated to it via borrower
 Partial participation: some entities may not participate in any
relationship in the relationship type
 Example: participation of customer in borrower is partial
 some customers may not participate in any loan
KEYS
108
 Key is used to identify every entity in the entity
set.
Types of keys
109
 Candidate Key
 Alternate & Primary key
 Superkey
Candidate Key
110
 It is the minimal set of attributes that uniquely identifies any entity
in entity set.
 There can be more than one candidate keys in entity set.
 More than one attribute can together form a single candidate key.
 Suppose that a combination of customer-name and customer-street is
sufficient to distinguish among members of the customer entity set.
 Then, both {customer-id} and {customer-name, customer-street} are
candidate keys.
 Although the attributes customer-id and customer-name together can
distinguish customer entities, their combination does not form a
candidate key, since the attribute customer-id alone is a candidate
key.
Alternate & Primary key
111
 Alternate & Primary key is related with candidate
key.
 In entity set, primary key is a candidate key but
only one key is the primary key & the left
candidate keys are called alternate key.
 AK=CK-PK
Superkey
112
 A superkey is the superset of any candidate key.
 For example, the customer-id attribute of the entity
set customer is sufficient to distinguish one
customer entity from another.
 Thus, customer-id is a superkey.
 Similarly, the combination of customer-name and
customer-id is a superkey for the entity set
customer.
 The customer-name attribute of customer is not a
superkey, because several people might have the
same name.
 Example: {customer-id}, {customer-name,
customer-id}
Weak Entity Types
113
 An entity type that does not have a primary key is
referred to as a weak entity type.
Weak Entity types (Cont.)
114
 We depict a weak entity type by double rectangles.
 We underline the partial key of a weak entity type with a
dashed line.
 payment_number – partial key of the payment entity type
 Primary key for payment – (loan_number, payment_number)
Give me answer?
115
 Can we convert weak entity type into strong
entity type?
PROBLEMS ON E-R DIAGRAM
116
Question: An employee works in one
department. The department contains phone,
the employee also has phone. Assume that an
employee works in maximum 2 departments or
minimum one department. Each department
must have maximum 3 phones or minimum
zero phone. Design an E-R diagram for the
above.
117
Steps in ER Modeling
118
 Identify the Entities
 Find relationships
 Identify the key attributes for every Entity
 Identify other relevant attributes
 Draw complete E-R diagram with all attributes
including Primary Key
EER (Enhanced Entity-Relationship
)
119
 The EER model is a high-level or conceptual data
model incorporating extensions to the original Entity-
relationship (ER) model.
 EER includes all the concepts of ER model.
 EER=ER all the concepts + some extension
 Additionally it includes the concepts of
 superclass and subclass
 specialization and generalization.
Subclasses and Superclasses
120
 An entity type may have additional meaningful
subgroupings.
 Example: EMPLOYEE may be further grouped into
SECRETARY, ENGINEER,
MANAGER, TECHNICIAN,
SALARIED_EMPLOYEE,
HOURLY_EMPLOYEE,…
 Each is called a subclass of EMPLOYEE
 EMPLOYEE is the superclass for each of these
subclasses.
Specialization
121
 Specialization is the process of defining a set of
subclasses of a superclass.
 The set of subclasses is based upon some
characteristics of the entities in the superclass.
• Attributes of a subclass are called specific attributes.
 It follows top-down design process.
 Represented by a triangle component labeled ISA
(E.g. customer “is a” person).
Example of Specialization
122
 Consider an entity set person, with attributes name, street, and
city. A person may be further classified as one of the
following:
 customer
 employee
 Each of these person types is described by a set of attributes
that includes all the attributes of entity set person plus possibly
additional attributes.
 For example, customer entities may be described further by the
attribute customer-id, whereas employee entities may be
described further by the attributes employee-id and salary.
 The specialization of person allows us to distinguish among
persons according to whether they are employees or customers.
Generalization
123
 It is a bottom-up design process.
 Generalization is a simple inversion of specialization.
 In this process multiple entity sets are synthesized into a
higher-level entity set on the basis of common features.
 For example, customer entity set with the attributes name,
street, city, and customer-id, and an employee entity set with
the attributes name, street, city, employee-id, and salary.
 There are similarities between the customer entity set and the
employee entity set in the sense that they have several
attributes in common.
 This commonality can be expressed by generalization.
 person is the higher-level entity set and customer and
employee are lower-level entity sets.
Continued……….
124
 The person entity set is the superclass of the
customer and employee subclasses.
 Differences in the two approaches may be
characterized by their starting point and overall
goal.
Specialization and
generalization
125
Design Constraints on a
Specialization/Generalization
126
 Constraint on which entities can be members of a given lower-
level entity set.
 Condition-defined
 Example: all customers over 65 years are members of
senior-citizen entity set; senior-citizen ISA person.
 User-defined
 Constraint on whether or not entities may belong to more than one
lower-level entity set within a single generalization.
 Disjoint
 an entity can belong to only one lower-level entity set
 Noted in E-R diagram by writing disjoint next to the ISA
triangle
 Overlapping
 an entity can belong to more than one lower-level entity set
Design Constraints on a
Specialization/Generalization (Cont.)
127
 Completeness constraint -- specifies whether or not
an entity in the higher-level entity set must belong to at
least one of the lower-level entity sets within a
generalization.
 total : an entity must belong to one of the lower-
level entity sets
 partial: an entity need not belong to one of the
lower-level entity sets
AGGREGATION
128
A ternary relationship
129
E-R diagram with redundant relationships
130
Aggregation
131
 Aggregation is an abstraction through which
relationships are treated as higher level entities.
 Thus, for our example, we regard the relationship set
works-on (relating the entity sets employee, branch, and
job) as a higher-level entity set called works-on.
 Such an entity set is treated in the same manner as is
any other entity set.
 We can then create a binary relationship manages
between works-on and manager to represent who
manages what tasks.
E-R Diagram With
Aggregation132
Assignment
133
1. Construct an E-R diagram for a car-insurance company whose customers
own one or more cars each. Each car has associated with it zero to any
number of recorded accidents.
2. A university registrar’s office maintains data about the following entities:
 Courses, including number, title, credits, syllabus, and prerequisites
 Course offerings, including course number, year, semester, section number,
instructor(s), timings, and classroom
 Students, including student-id, name, and program
 Instructors, including identification number, name, department, and title.
Further, the enrollment of students in courses and grades awarded to students
in each course they are enrolled for must be appropriately modeled.
Construct an E-R diagram for the registrar’s office. Document all
assumptions that you make about the mapping constraints.
Continued…..
134
3. Design an E-R diagram for keeping track of the
exploits of your favorite sports team. You should store
the matches played, the scores in each match, the
players in each match and individual player statistics
for each match. Summary statistics should be
modeled as derived attributes.
4. Construct an E-R diagram for a hospital with a set of
patients and a set of medical doctors. Associate with
each patient a log of the various tests and
examinations conducted.
Continued…..
135
5. Consider a university database for the scheduling of classrooms for final exams. This
database could be modeled as the single entity set exam, with attributes course-
name, section-number, room-number, and time. Alternatively, one or more
additional entity sets could be defined, along with relationship sets to replace some
of the attributes of the exam entity set, as
course with attributes name, department, and c-number
section with attributes s-number and enrollment, and dependent as a weak entity
set on course
room with attributes r-number, capacity, and building
(a) Show an E-R diagram illustrating the use of all three additional entity sets listed.
(b) Explain what application characteristics would influence a decision to include or
not to include each of the additional entity sets.
6. Construct an E-R diagram for a Bank.
Storage-device hierarchy
136
Storage hierarchy includes two
main categories:137
 Primary storage (main memory, cache
memory)
 Secondary storage (Magnetic disks,
Magnetic tapes and optical disks)
Buffer Manager
138
 Files reside permanently on disks.
 Each file is partitioned into fixed-length
storage units called blocks.
 The buffer is the part of main memory
available for storage of copies of disk blocks.
 The subsystem responsible for the allocation of
buffer space is called the buffer manager.
Buffer Manager techniques
139
• Buffer replacement strategy: When there is no room left in the
buffer, a block must be removed from the buffer. Most
operating systems use a least recently used (LRU) scheme.
• Pinned blocks: Most recovery systems require that a block
should not be written to disk while an update on the block is in
progress. A block that is not allowed to be written back to disk
is said to be pinned.
• Forced output of blocks: There are situations in which it is
necessary to write back the block to disk, even though the
buffer space that it occupies is not needed. This write is called
the forced output of a block.
Record Structure
140
 The database is stored as a collection of files.
 Each file is a sequence of records.
 A record is a sequence of fields.
Types of records
 Fixed-Length Records: every record in the file has
exactly the same size (in bytes).
 Variable-Length Records: different records in the file
have different sizes.
Fixed-Length Records
141
Let us consider a file of account records for bank
database.
Each record of this file is defined as:
Account-number: char (10);
Branch-name: char (22);
Balance: real; //Real size=8
Record size= 10+22+8= 40 bytes
A simple approach is to use the first 40 bytes for the first
record, the next 40 bytes for the second record, and so
on.
142
There are two problems with this simple
approach:
1. It is difficult to delete a record from this
structure. The space occupied by the record
to be deleted must be filled with some other
record of the file.
2. Unless the block size happens to be a multiple
of 40, some records will cross block
boundaries. It would thus require two block
accesses to read or write such a record.
Deletion of record 1st approach
143
 When a record is deleted, we could move the record that came after it into
the space occupied by the deleted record, and so on, until every record
following the deleted record has been moved ahead. Such an approach
requires moving a large number of records.
Deletion of record 2nd approach
144
 It might be easier simply to move the final record of the file into the space
occupied by the deleted record. It is undesirable to move records to occupy
the space freed by a deleted record, since doing so requires additional block
accesses.
Deletion of record 3rd
approach145
 Since insertions tend to be more frequent than deletions, it is acceptable to
leave open the space occupied by the deleted record, and to wait for a
subsequent insertion before reusing the space.
Variable-Length Records
146
 Variable-length records arise in database
systems in several ways:
 Storage of multiple record types in a file.
 Record types that allow variable lengths for one
or more fields.
 Record types that allow repeating fields (used in
some older data models).
Techniques for implementing
variable-length records147
 Byte-String Representation
 Fixed-Length Representation
Byte-String Representation
148
 A simple method for implementing variable-length records is
to attach a special end-of-record (⊥) symbol to the end of each
record.
Byte-string representation disadvantages:
149
 It is not easy to reuse space occupied formerly
by a deleted record.
 There is no space, in general, for records to
grow longer.
Slotted-page structure
150
 A modified form of the byte-string representation, called the
slotted-page structure, is commonly used for organizing
records within a single block.
151
 There is a header at the beginning of each block, containing the
following information:
1. The number of record entries in the header
2. The end of free space in the block
3. An array whose entries contain the location and size of each record
 The actual records are allocated contiguously in the block, starting
from the end of the block.
 The free space in the block is contiguous, between the final entry in
the header array, and the first record.
 If a record is inserted, space is allocated for it at the end of free
space, and an entry containing its size and location is added to the
header.
 If a record is deleted, the space that it occupies is freed, and its entry
is set to deleted.
Fixed-Length Representation
152
 Another way to implement variable-length records
efficiently in a file system is to use one or more
fixed-length records to represent one variable-
length record.
There are two ways of doing this:
 1. Reserved space: If there is a maximum record
length that is never exceeded, we can use fixed-
length records of that length. Unused space (for
records shorter than the maximum space) is filled
with a special null, or end-of-record, symbol.
 2. List representation: We can represent
variable-length records by lists of fixed length
records, chained together by pointers.
File organization
153
 File organization includes the way records and
blocks are placed on the storage medium.
 There are two types of file organization
 Primary File Organizations
 Secondary File Organizations
Primary File Organizations
154
 Unordered or Heap or Pile Files
 Ordered or Sorted or sequential Files
 Hash or Direct Files
Unordered or Heap or Pile
Files155
 Records are placed in the file in the order in
which they are inserted.
 Inserting a new record is very efficient.
 Searching can be done by linear search
(inefficient).
 Deletion is very inefficient.
Ordered or Sorted or sequential
Files156
 It store records in sequential order, based on the
value of the search key of each record.
 An attribute or set of attribute used to look up
records in a file is called a search key.
Advantages of Ordered Files
157
 Reading of the records in order of the ordering
field is extremely efficient, because no sorting is
required.
 Finding the next record is fast.
Disadvantages of Ordered Files
158
 Searches on non-ordering fields are inefficient.
 Insertion and deletion of records are very
expensive.
Hash or Direct Files
159
 Hash function computed on some attribute of each
record; the result specifies where record should be
placed.
Secondary File Organizations
160
 Secondary file organization uses the index to access
the records.
 An index for a file in a database system works in the
same way as the index in any textbook.
 If we want to learn about a particular topic (specified
by a word or a phrase) , we can search for the topic in
the index at the back of the book.
 Indexes provide faster access to data.
Types of Indexes
161
• Single-level ordered indexes
• Primary indexes
• Secondary indexes
• Clustering indexes
• Multi-level Indexes
• Dynamic Multi-level indexes using B-trees and B+-
trees
Primary indexes162
 A Primary Index is constructed of two parts: The
first field is the same data type of the primary
key of a file block of the data file and the second
field is file block pointer.
Indexes can also be characterized as163
 Dense: A dense index has an index entry for
every search key value (and hence every record) in
the data file.
 Sparse (nondense): A sparse (or nondense) index,
on the other hand, has index entries for only some
of the search values.
 A primary index is hence a nondense (sparse)
index, since it includes an entry for each disk block
of the data file rather than for every search value
(or every record).
Problem with a primary index
164
 A major problem with a primary index—as with
any ordered file—is insertion and deletion of
records.
Clustering Indexes
165
 If records of a file are physically ordered on a
nonkey field—which does not have a distinct value
for each record—that field is called the clustering
field.
 A clustering index is also an ordered file with two
fields; the first field is of the same type as the
clustering field of the data file, and the second field
is a block pointer.
166
Secondary Indexes
167
 A Secondary Index is an ordered file with two
fields.
 The first is of the same data type as some
nonordering field and the second is either a block
or a record pointer.
 If the entries in this nonordering field must be
unique this field is sometime referred to as a
Secondary Key. This results in a dense index.
168
Comparison between indexes
169
Multilevel Indexes
170
 A Multilevel Index is where you construct an
Second- Level index on a First-Level Index.
Continue this process until the entire index
can be contained in a Single File Block.
171
Dynamic Multilevel Indexes
Using B-Trees and B+-Trees172
 B-trees and B+-trees are special cases of the well-known tree
data structure.
 A tree is formed of nodes.
 Each node in the tree, except for a special node called the
root, has one parent node and several—zero or more—
child nodes.
 The root node has no parent. A node that does not have any
child nodes is called a leaf node; a nonleaf node is called an
internal node.
 The level of a node is always one more than the level of
its parent, with the level of the root node being zero.
 A subtree of a node consists of that node and all its
descendant nodes—its child nodes, the child nodes of
its child nodes, and so on.
B tree
173
 A B-tree of order m (the maximum number of children
for each node) is a tree which satisfies the following
properties:
 Every node has at most m children.
 Every node (except root and leaves) has at least m⁄2
children.
 The root has at least two children if it is not a leaf node.
 All leaves appear in the same level, and carry
information.
 A non-leaf node with k children contains k–1 keys.
Structure of B tree
174
175
B tree with order 3
Insertion algorithm
176
 All insertions start at a leaf node. To insert a new element Search the tree to
find the leaf node where the new element should be added.
 Insert the new element into that node with the following steps:
1. If the node contains fewer than the maximum legal number of elements,
then there is room for the new element. Insert the new element in the
node, keeping the node's elements ordered.
2. Otherwise the node is full, so evenly split it into two nodes.
 A single median is chosen from among the leaf's elements and the new
element.
 Values less than the median are put in the new left node and values greater
than the median are put in the new right node, with the median acting as a
separation value.
 Insert the separation value in the node's parent, which may cause it to be
split, and so on. If the node has no parent (i.e., the node was the root), create
a new root above this node (increasing the height of the tree).
A B Tree insertion example with
each iteration177
B+ tree
178
 Properties of a B+ Tree of order m :
 All internal nodes (except root) has at least v keys
and at most 2m keys .
 The root has at least 2 children unless it’s a leaf..
 All leaves are on the same level.
 An internal node with k keys has k+1 children
Inserting a Data Entry into a B+ Tree:
Summary179
 Find correct leaf L.
 Put data entry onto L.
 If L has enough space, done!
 Else, must split L (into L and a new node L2)
 Redistribute entries evenly, put middle key in L2
 copy up middle key.
 Insert index entry pointing to L2 into parent of L.
 This can happen recursively
 To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
 Splits “grow” tree; root split increases height.
 Tree growth: gets wider or one level taller at top.
Inserting 16*, 8* into Example B+ tree
180 Root
17 24 3013
2* 3* 5* 7* 8*
2* 5* 7*3*
17 24 3013
8*
You overflow
One new child (leaf node)
generated; must add one more
pointer to its parent, thus one more
key value as well.
14* 15* 16*
Inserting 8* (cont.)
 Copy up the
middle value
(leaf split)
181
2* 3* 5* 7* 8*
5
Entry to be inserted in parent node.
(Note that 5 is
continues to appear in the leaf.)
s copied up and
13 17 24 30
You overflow!5 13 17 24 30
182
(Note that 17 is pushed up and only
appears once in the index. Contrast
Entry to be inserted in parent node.
this with a leaf split.)
5 24 30
17
13
Insertion into B+ tree (cont.)
5 13 17 24 30
• Understand
difference
between copy-
up and push-up
• Observe how
minimum
occupancy is
guaranteed in
both leaf and
index pg splits.
We split this node, redistribute entries evenly,
and push up middle key.

Example B+ Tree After Inserting 8*
183
Notice that root was split, leading to increase in height.
2* 3*
Root
17
24 30
14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*

More Related Content

What's hot

DB2 LUW - Backup and Recovery
DB2 LUW - Backup and RecoveryDB2 LUW - Backup and Recovery
DB2 LUW - Backup and Recovery
imranasayed
 
Lecture 01 introduction to database
Lecture 01 introduction to databaseLecture 01 introduction to database
Lecture 01 introduction to database
emailharmeet
 

What's hot (20)

Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Database management system
Database management systemDatabase management system
Database management system
 
Introduction to Database Management System
Introduction to Database Management SystemIntroduction to Database Management System
Introduction to Database Management System
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Dbms slides
Dbms slidesDbms slides
Dbms slides
 
Data base
Data baseData base
Data base
 
Data models
Data modelsData models
Data models
 
DB2 LUW - Backup and Recovery
DB2 LUW - Backup and RecoveryDB2 LUW - Backup and Recovery
DB2 LUW - Backup and Recovery
 
Database management system
Database management system   Database management system
Database management system
 
database language ppt.pptx
database language ppt.pptxdatabase language ppt.pptx
database language ppt.pptx
 
Database language
Database languageDatabase language
Database language
 
Lecture 01 introduction to database
Lecture 01 introduction to databaseLecture 01 introduction to database
Lecture 01 introduction to database
 
DBMS Part1.pptx
DBMS Part1.pptxDBMS Part1.pptx
DBMS Part1.pptx
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Database
DatabaseDatabase
Database
 
Files Vs DataBase
Files Vs DataBaseFiles Vs DataBase
Files Vs DataBase
 
12. oracle database architecture
12. oracle database architecture12. oracle database architecture
12. oracle database architecture
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Introduction to DBMS(For College Seminars)
Introduction to DBMS(For College Seminars)Introduction to DBMS(For College Seminars)
Introduction to DBMS(For College Seminars)
 
Dbms
DbmsDbms
Dbms
 

Viewers also liked (12)

Турфирма Компас
Турфирма КомпасТурфирма Компас
Турфирма Компас
 
Ciencia protésica
Ciencia protésicaCiencia protésica
Ciencia protésica
 
Prediction of Community Behavior in News Social Media using Deep Learning
Prediction of Community Behavior in News Social Media using Deep LearningPrediction of Community Behavior in News Social Media using Deep Learning
Prediction of Community Behavior in News Social Media using Deep Learning
 
Equipos de isamientos
Equipos de isamientosEquipos de isamientos
Equipos de isamientos
 
Esquisse d'une gestion de patrimoine durable
Esquisse d'une gestion de patrimoine durableEsquisse d'une gestion de patrimoine durable
Esquisse d'une gestion de patrimoine durable
 
C.V. Eng. Abdulqader 10-8-2015
C.V. Eng. Abdulqader 10-8-2015C.V. Eng. Abdulqader 10-8-2015
C.V. Eng. Abdulqader 10-8-2015
 
SJonesResume
SJonesResumeSJonesResume
SJonesResume
 
La fonction manageriale
La fonction managerialeLa fonction manageriale
La fonction manageriale
 
WordPress 4.4 and Beyond
WordPress 4.4 and BeyondWordPress 4.4 and Beyond
WordPress 4.4 and Beyond
 
Plaquette commerciale TÊTU 2012 - Jean-Louis Roux-Fouillet
Plaquette commerciale TÊTU 2012 - Jean-Louis Roux-FouilletPlaquette commerciale TÊTU 2012 - Jean-Louis Roux-Fouillet
Plaquette commerciale TÊTU 2012 - Jean-Louis Roux-Fouillet
 
Ester
EsterEster
Ester
 
C# Desktop. Занятие 05.
C# Desktop. Занятие 05.C# Desktop. Занятие 05.
C# Desktop. Занятие 05.
 

Similar to INTRODUCTION TO DATABASE

Chap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptChap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.ppt
LisaMalar
 
We presented a list of capabilities that should be provided by the DB.pdf
We presented a list of capabilities that should be provided by the DB.pdfWe presented a list of capabilities that should be provided by the DB.pdf
We presented a list of capabilities that should be provided by the DB.pdf
archanacomputers1
 
A database is generally used for storing related, structured data, w.pdf
A database is generally used for storing related, structured data, w.pdfA database is generally used for storing related, structured data, w.pdf
A database is generally used for storing related, structured data, w.pdf
angelfashions02
 

Similar to INTRODUCTION TO DATABASE (20)

Unit3rd
Unit3rdUnit3rd
Unit3rd
 
ms-11.pdf
ms-11.pdfms-11.pdf
ms-11.pdf
 
Dbms
DbmsDbms
Dbms
 
DBMS unit 1
DBMS unit 1DBMS unit 1
DBMS unit 1
 
Relational database management systems
Relational database management systemsRelational database management systems
Relational database management systems
 
Database System Concepts and Architecture.ppt
Database System Concepts and Architecture.pptDatabase System Concepts and Architecture.ppt
Database System Concepts and Architecture.ppt
 
DBMS.pptx
DBMS.pptxDBMS.pptx
DBMS.pptx
 
Databaselpu
DatabaselpuDatabaselpu
Databaselpu
 
data base management report
data base management report data base management report
data base management report
 
Chap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptChap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.ppt
 
DBMS-1.pptx
DBMS-1.pptxDBMS-1.pptx
DBMS-1.pptx
 
Components and Advantages of DBMS
Components and Advantages of DBMSComponents and Advantages of DBMS
Components and Advantages of DBMS
 
MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
 
Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)Lecture 1&2(rdbms-ii)
Lecture 1&2(rdbms-ii)
 
Database Management System ( Dbms )
Database Management System ( Dbms )Database Management System ( Dbms )
Database Management System ( Dbms )
 
Complete dbms notes
Complete dbms notesComplete dbms notes
Complete dbms notes
 
Data Base Management Systems
Data Base Management SystemsData Base Management Systems
Data Base Management Systems
 
Introduction to Databases and Transactions
Introduction to Databases and TransactionsIntroduction to Databases and Transactions
Introduction to Databases and Transactions
 
We presented a list of capabilities that should be provided by the DB.pdf
We presented a list of capabilities that should be provided by the DB.pdfWe presented a list of capabilities that should be provided by the DB.pdf
We presented a list of capabilities that should be provided by the DB.pdf
 
A database is generally used for storing related, structured data, w.pdf
A database is generally used for storing related, structured data, w.pdfA database is generally used for storing related, structured data, w.pdf
A database is generally used for storing related, structured data, w.pdf
 

More from CS_GDRCST (6)

Programming in c
Programming in cProgramming in c
Programming in c
 
Java package
Java packageJava package
Java package
 
Exam tips
Exam tipsExam tips
Exam tips
 
Ds new
Ds newDs new
Ds new
 
Computer Organization and Architecture.
Computer Organization and Architecture.Computer Organization and Architecture.
Computer Organization and Architecture.
 
Computer Networks basics and OSI
Computer Networks basics and OSIComputer Networks basics and OSI
Computer Networks basics and OSI
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 

INTRODUCTION TO DATABASE

  • 1. INTRODUCTION TO DATABASE 1 A Presentation on Prepared by: Jyoti Giri Assistant Professor GDRCST, Bhilai
  • 2. What is data? 2  A collection of facts from which conclusion may be drawn such as “statistical data”.  Data is the plural form of datum.  It is a representation of facts or concepts in an organized manner in order that it may be stored, communicated, interpreted or processed by automated means. Example: researchers who conduct market research survey might ask member of the public to complete questionnaires about a product or a service. These completed questionnaires are data; they are processed and analyze in order to prepare a report on the survey.
  • 3. Properties of Data (In database) 3  Data should be well organized.  Data should be related.  Data should be accessible in any order.  One data should be stored minimum number of times.
  • 4. What is a Database? 4  Database is a collection of related data, that contains information relevant to an enterprise.  For example: 1. University database 2. Employee database 3. Student database 4. Airlines database etc…..
  • 5. PROPERTIES OF A DATABASE5  A database represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD).  A database is a logically coherent collection of data with some inherent meaning.  A database is designed, built and populated with data for a specific purpose.
  • 6. What is Database Management System (DBMS)?6  A database management system (DBMS) is a collection of programs that enables users to create & maintain a database. It facilitates the definition, creation and manipulation of the database.  Definition – it holds only structure of database, not the data. It involves specifying the data types, structures & constraints for the data to be stored in the database.  Creation –it is the inputting of actual data in the database. It involves storing the data itself on some storage medium that is controlled by the DBMS.  Manipulation-it includes functions such as updation, insertion, deletion, retrieval of specific data and generating reports from the data.
  • 7. Typical DBMS Functionality 7  Define a database : in terms of data types, structures and constraints  Construct or Load the Database on a secondary storage medium  Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content  Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent
  • 8. Typical DBMS Functionality 8 Other features:  Protection or Security measures to prevent unauthorized access  “Active” processing to take internal actions on data  Presentation and Visualization of data
  • 9. Database System 9  The database and the DBMS together is called the database system.  Database systems are designed to manage large bodies of information.  It involves both defining structures for storage of information & providing mechanisms for the manipulation of information.  Database system must ensure the safety of the information stored.
  • 10. A simplified database system environment10
  • 11. Database System Applications 11  Banking- for customer information, accounts & loans, and banking transactions.  Airlines-for reservations & schedule information.  Universities-for student information, course registration and grades.  Credit card transactions-for purchases on credit cards & generation of monthly statements.  Telecommunication-for keeping records of calls made, generating monthly bills, maintaining balances, information about communication networks.  Finance-for storing information about holdings, sales & purchases of financial instruments such as stocks & bonds.  Sales-for customer, product and purchase information.  Manufacturing-for management of supply chain & for tracking production of items in factories.  Human resources-for information about employees, salaries, payroll taxes and benefits
  • 12. Traditional File systems 12  Before the evolution of DBMS, organizations used to store information in file systems.  A typical file processing system is supported by a conventional operating system.  The system stores permanent records in various files & it need application program to extract records , or to add or delete records .  In traditional file processing, each user defines and implements the files needed for a specific application.
  • 13. Traditional file system 13  For example, one user, the grade reporting office, may keep a file on students and their grades. Programs to print a student’s transcript and to enter new grades into the file are implemented.  A second user, the accounting office, may keep track of students’ fees and their payments.  Although both users are interested in data about students, each user maintains separate files—and programs to manipulate these files—because each requires some data not available from the other user’s files.  This redundancy in defining and storing data results in wasted storage space and in redundant efforts to maintain common data up-to-date.
  • 14. Disadvantages of File systems 14 1.Data Redundancy & Inconsistency 2.Difficulty in Accessing data 3.Data Isolation 4.Integrity Problems 5.Atomicity Problems 6.Concurrent access Anomalies or Problems 7.Security Problems
  • 15. Data Redundancy & Inconsistency 15  Different programmers work on a single project , so various files are created by different programmers at some interval of time.  So various files are created in different formats & different programs are written in different programming language.  Same information is repeated.  For example: name & address may appear in saving account file as well as in salary account.  This redundancy results in higher storage space & access cost.  It also leads to data inconsistency which means that if we change some record in one place the change will not be reflected in all the places.  For ex. a changed customer address may be reflected in saving record but not any where else.
  • 16. 16  Accessing data from a list is also a difficulty in file system.  Suppose we want to see the records of all customers who has a balance less than Rs10,000, we can either check the list & find the names manually or write an application program.  If we write an application program & at some later time, we need to see the records of customer who have a balance of less than Rs 20,000, then again a new program has to be written.  It means that file processing system do not allow data to be accessed in a convenient manner. Difficulty in Accessing data
  • 17. 17  As the data is stored in various files, & various files may be stored in different format, writing application program to retrieve the data is difficult. Data Isolation
  • 18. Integrity Problems 18  We need that data stored should satisfy certain constraints as in a bank a minimum deposit should be of 1000 Rs.  Developers enforce these constraints by writing appropriate programs but if later on some new constraint has to be added then it is difficult to change the programs to enforce them.
  • 19. 19  Any mechanical or electrical device is subject to failure, and so is the computer system.  In this case we have to ensure that data should be restored to a consistent state.  For example an amount of Rs 50 has to be transferred from Account A to Account B.  Let the amount has been debited from account A but have not been credited to Account B and in the mean time, some failure occurred.  So, it will lead to an inconsistent state.  So, we have to adopt a mechanism which ensures that either full transaction should be executed or no transaction should be executed i.e. the fund transfer should be atomic. Atomicity Problems
  • 20. 20  Many systems allows multiple users to update the data simultaneously.  It can also lead the data in an inconsistent state.  Suppose a bank account contains a balance of Rs 500 & two customers want to withdraw Rs100 & Rs 50 simultaneously.  Both the transaction reads the old balance & withdraw from that old balance which will result in Rs 450 , Rs 400 which is incorrect. Concurrent access Problems
  • 21. 21  All the user of database should not be able to access all the data.  For example a payroll Personnel needs to access only that part of data which has information about various employees & are not needed to access information about customer accounts. Security Problems
  • 22. Advantages of DBMS 22  Controlling Redundancy  Restricting Unauthorized Access  Providing Storage Structures for Efficient Query Processing  Providing Backup and Recovery  Providing Multiple User Interfaces  Representing Complex Relationship among Data  Enforcing Integrity Constraints  Permitting Inferencing and Actions using Rules
  • 23. Disadvantages of DBMS 23  Cost of Hardware & Software  Cost of Data Conversion  Cost of Staff Training  Appointing Technical Staff  Database Damage
  • 24. Users may be divided into 24  Those who actually use and control the content (called “Actors on the Scene”)  those who enable the database to be developed and the DBMS software to be designed and implemented (called “Workers Behind the Scene”).
  • 25. Actors on the scene 25  Database administrators  Database Designers  End-users
  • 26. Database administrators (DBA) 26  Database administrators is the controller of the overall operations of the database.  But he is not responsible for creating the database or the structure of the database.  Database administrators is the most powerful actor on the scene.
  • 27. Functions of DBA 27  Authorizing access to the database  Coordinating & monitoring the database  For acquiring hardware & software resources as needed by the user  Concurrency control checking  Security of the database  Making backups & recovery  Modification of the database structure & its relation to the physical database
  • 28. Database Designers (DBD) 28  Database Designers is the person who designs the database structure for the first time pre-requisites i.e. to collect data from which source is decided by DBD.
  • 29. Functions of DBD 29  the creation of original description of the database structure  database designers interact with different group of users & integrate their views to make the best structure.
  • 30. End-users 30 They use the data for queries, reports and some of them actually update the database content. Types of end-users  Casual  Naïve or Parametric  Sophisticated  Specialized or Stand-alone
  • 31. 31  Casual: they can only browse through the database; they cannot create, update or make any changes in the database.  Naïve or Parametric: they use the readymade software which deals with the database. They can only update the database. Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.
  • 32. 32  Sophisticated: these include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities. Many use tools in the form of software packages that work closely with the stored database.  Stand-alone: mostly maintain personal databases using ready-to-use packaged applications. An example is a tax program user that creates his or her own internal database.
  • 33. Workers behind the scene 33  DBMS system implementers: they are the creators of the DBMS.  Tools Developers: tools are the facilities provided to help the DBMS or the user. They are packages for database design, performance monitoring, graphical interfaces, and simulation package. Tool developers, develop the tools for DBMS.  Operators & maintenance personnel: these are the workers/persons required for maintaining the hardware or software of the DBMS.
  • 34. DATA MODEL 34  A data model is a collection of concepts that can be used to describe the structure of a database.  By structure of a database we mean the  Data types,  Relationships,  Constraints that should hold on the data.
  • 35. Categories of data models 35  Conceptual (high-level, semantic) data models  Physical (low-level, internal) data models  Implementation (representational, record based) data models
  • 37. Conceptual data models 37  Before implementation, a rough model of database is created.  This model is never implemented but is used for designing purpose.  Also called entity-based or object-based data models. Example: E-R Model
  • 38. E-R model 38  Stands for entity-relationship model. Terms used in E-R model: Field – Attribute Record – Entity File – Entity Type
  • 39. E-R Model 39 Entity – It is an object with a physical existence. Ex: An object with a physical existence – a person, a car, a house or it may be an object with conceptual existence – a company, a job or a university. Attribute – Attributes are the particular properties that describe an entity. Ex: A STUDENT entity may be described by student’s name, age, address, class, grade.
  • 40. EXAMPLE 40 Quic kTime™ and a dec ompress or are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture.
  • 42. Physical data models 42  It provides concepts that describe the details of how data is stored in the computer.  Concepts provided by physical data models are generally meant for computer specialists, not for typical end users.
  • 44. Implementation data models 44  Provide concepts that fall between the above two.  It also provides concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer.  Example: relational model, network model, hierarchical model.
  • 45. Relational Model The relational model uses a collection of tables to represent both data and the relationships among those data. Cust_id Cust_Name Cust_Add Cust_City 1000 Ajay Kohka Bhilai 1001 Vishal Shanti Nagar Nagpur Acc_No Bal A-101 5000 A-102 10000 45 Cust_id Acc_No 1000 A-101 1001 A-102 Customer Table Account Table Depositer Table
  • 46. Hierarchical Model 46  Hierarchical data model uses tree structures to represent relationship among records.  Trees structures occur naturally in many data organizations because some entities have an intrinsic hierarchical order . Institute ->Programs->courses->Students
  • 47. Network Model 47  This model uses two different data structures to represent the database entities and relationships between the entities, namely record type & Set type  A record type is used to represent an entity type . It is made up of a number of data items that represent the attributes of the entity.  A set type is used to represent a directed relationship between two record types called owner record type & member record type.
  • 48. 48 Record Type (Department & Employee) Set Type (Dept - Emp) with department as the owner record type & employee as the member record type. Example
  • 49. SCHEMAS AND INSTANCES 49  The description of a database is called the database schema, which is specified during database design and is not expected to change frequently.  The collection of information stored in the database at a particular moment is called an instance of the database. It changes very frequently than the schema.
  • 51. View of Data 51  A major purpose of a database system is to provide users with an abstract view of the data.  That is, the system hides certain details of how the data are stored and maintained.  So the method of hiding the actual (complex) details from users is called as the levels of data abstraction.
  • 52. Levels of data abstraction 52
  • 53. Physical level 53  It is the lowest level of abstraction & specifies how the data is actually stored. Example: A banking enterprise may have several such record types, including  Customer, with customer-id, customer-name, customer-street, customer-city  account, with fields account-number and balance  employee, with fields employee-name and salary  At the physical level, a customer, account, or employee record can be described as a block of consecutive storage locations (for example, words or bytes). The language compiler hides this level of detail from programmers. Similarly, the database system hides many of the lowest-level storage details from database programmers.
  • 54. Logical level 54  It is the next level of abstraction & describes what data are stored in database & what relationship exists between various data. Example :  At the logical level, each such record is described by a type definition, and the interrelationship of these record types is defined as well.  Programmers using a programming language work at this level of abstraction.
  • 55. 55 • This level contains the actual data which is shown to the users. • This is the highest level of abstraction & the user of this level need not know the actual details of data storage. Example:  At the view level, several views of the database are defined, and database users see these views. For example, tellers in a bank see only that part of the database that has information on customer accounts; they cannot access information about salaries of employees. View level
  • 57. Three-schema architecture 57  The three-schema architecture is a convenient tool for the user to visualize the schema levels in a database system.  In this architecture, schemas can be defined at the following three levels:  Internal schema/Physical schema  Conceptual schema  External schema
  • 58. 58  The internal level has an internal schema, which describes the physical storage structure of the database.  The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints.  The external or view level includes a number of external schemas or user views.  The processes of transforming requests and results between levels are called mappings.
  • 59. Example: university database 59 Conceptual schema:  Student (sid: string, name: string, age: number, percent: real)  Courses (cid: string, cname: string, credits: number)  Enrolled (sid: string, cid: string, grade: string) Physical schema:  Relations stored as unordered files.  Index on first column of students External schema:  Course_info(cid: string, enrollment: integer)
  • 60. DATA INDEPENDENCE 60  The changes can be made in one level without affecting the other levels that is called data independence.  Data independence is the capacity to change the schema at one level of a database system without having to change the schema at the next higher level.
  • 61. Types of data independence 61  Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs.  Physical data independence is the capacity to change the internal schema without having to change the conceptual (or external) schemas.
  • 62. DBMS Structure 62 Telecomm System Compiled User Interface Compiled Application Prog. Batch User Naive User Casual User DBA Telecomm System Telecomm System Query Processor DBMS & its Data Manager OS or Own File Manager OS Disk Manager Data Files & Data Dictionary DDL Compiler
  • 63. 63  DDL Compiler (Data definition Language Compiler): the DDL compiler converts the data definition statements into a set of tables. These tables contain the metadata concerning the database & are in a form that can be used by other components of the DBMS.  Data Manager: the data manager is the central software component of the DBMS. It is sometimes referred to as the database control system. Functions:  Converts operations in the user’s queries coming directly via the query processor or indirectly via an application program from the user’s logical view to a physical file system.  Responsible for interfacing with the file system.  Tasks of enforcing constraints to maintain the consistency, integrity & security of the data.  Synchronizing the simultaneous operations performed by concurrent users.  Entrusting backups & recovery operations.
  • 64. 64 File Manager:  Responsibility for the structure of the files & managing the file space rests with the file manager.  Responsible for locating the block containing the required record, requesting this block from the disk manager & transmitting the required record to the data manager. Disk Manager:  The disk manager is part of the operating system & all physical input & output operations are performed by it.  Transfers the block or page requested by the file manager.
  • 65. 65 Query Processor: The query processor is used to interpret the online user’s query & convert it into an efficient series of operations in a form capable of being sent to the data manger for execution. Functions:  The data manipulation statements are compiled separately into a sequence of optimized operations on the database.  Transfers data to & form a work-area indicated in a subroutine call & control returns to the applications programs.  During execution, when a subroutine call inserted in place of the data manipulation statements, control transfers to the run-time system. This system in turn transfers control to the compiled version of the original data manipulation statements. These data manipulation are executed by the data manager.  A user action that requires a database operation causes the application program to request the service via its run time system & data manager.  Batch users of the database also interact with the database via their application program, its run-time system & data manager.
  • 66. 66 Telecommunication System:  it is a software system used to communicate a remote or local computer by sending or receiving messages over communication lines.  Messages from the user are routed by the telecommunication system to the appropriate target & responses are sent back to the user.
  • 67. 67 Data Files: data files contain the data portion of the database. Data dictionary:  Information pertaining to the structure & usage of data contained in the database, the metadata, is maintained in a data dictionary.  It stores information concerning the external, conceptual & internal levels of the database.  It contains the source of each data-field value, the frequency of its use & details concerning updates.  The data dictionary itself is a database, documents the data.
  • 68. Entity- Relationship Model 68  The E-R model is the most commonly used conceptual model.  In this model, the real world consists of a collection of basic objects called entities and the relationships among these objects.  The end product of the modeling process is an entity-relationship diagram (ERD) or ER diagram.  This is very important conceptual data model.  But it is not implemented but design for creating the database.
  • 69. The E-R data model employs three basic notions:69  Entity  Attributes  Relationship
  • 70. Entity70  It is an object with a physical existence.  For example, each person in an enterprise , car, house, a company, or a university course.
  • 71. Entity Type & Entity Sets 71  Entity Type –  collection of entities that have the same attributes. Ex: STUDENT, UNIVERSITY  Entity Set –  The collection of all entities of a particular entity type. Ex: Set of all rows 10 rows of STUDENT Name Age Rollno STUDENT
  • 73. Attributes 73  Attributes are the particular properties that describe an entity. Ex: A STUDENT entity may be described by student’s name, student’s roll_number.
  • 75. Types of Attributes 75  Simple (Atomic) and Composite Attributes  Single Valued & Multi-valued Attributes  Stored and Derived Attributes  Null Valued Attributes  Complex Attributes
  • 76. Simple (Atomic) and Composite Attributes76  Simple attributes are not divisible into parts. For example, EmployeeNumber and Age.  Composite attributes can be divided into smaller subparts. These subparts represent basic attributes with independent meanings of their own. For example, take Name and address attributes.
  • 77. 77 Address Street Address city state Pin number street apartment no.
  • 78. Single Valued & Multi-valued Attributes78  Single-valued attributes have a single value for particular entity. Example: Roll_no, Age.  Multi-valued attributes may have more than one value for a single entity. Example: Phone_no
  • 79. Stored and Derived Attributes 79  Derived attribute is not stored in the database but it is derived from some attributes.  Example: If DOB is stored in the database then we can calculate age of a student by subtracting DOB from current date.  Hence, in this case DOB is the stored attribute and age is considered as derived.
  • 80. Null Valued Attributes 80  Null value is a value which is not inserted but it does not hold zero value.  The attributes which can have a null value called null valued attributes.  Example: Mobile_no attributes of a person may not be having mobile phones.
  • 81. Complex Attributes 81  Complex attribute is a combination of composite and multi-valued attributes. Complex attributes are represented by { } and composite attributes are represented by ( ).  Example: Address_phone attribute will hold both the address and phone_no of any person.  Example: {(2-A, St-5, Sec-4, Bhilai), 2398124}
  • 82. Key attribute in an entity type 82  Key attributes will be having a unique value for each entity of that attribute.  It identifies every entity in the entity set.  Key attribute will never be a null valued attribute.  Any composite attribute can also be a key attribute.  There could be more than one key attributes for an entity type. Example: roll_no, enrollment _no
  • 83. Domain of value set of an attribute83  Domain of an attribute is the allowed set of values of that attribute. Example: if attribute is ‘grade’, then its allowed values are A,B,C,F.  Grade ={A, B,C,F}
  • 84. TYPES OF ENTITY TYPES Strong entity type – Entity types that have at least one key attribute. Weak entity type – Entity type that does not have any key attribute. An entity in a weak entity type is identified by a relationship with a strong entity type and that relationship is called Identifying Relationship and that strong entity type is called the owner of the weak entity type. 84
  • 85. TYPES OF ENTITY TYPES 85 Roll No. Name Age 1 Rakesh 20 2 Nikhil 21 3 Nikhil 21 Name M1 M2 M3 Nikhil 50 45 40 Nikhil 80 75 82 Student Marks Secured Identifying Relationship
  • 86. Relationship 86  Relates two or more distinct entities with a specific meaning.  For example, EMPLOYEE John works on the ProductX PROJECT or  EMPLOYEE Franklin manages the Research DEPARTMENT. Terms used: Relationship type, Relationship set, Relationship instances.
  • 88. 88 Relationship type: secured Relationship set: {R1, R2, R3, R4} Relationship instances: R1
  • 90. NOTATIONS USED IN E-R DIAGRAM 90 Entity Type Attribute Key Attribute Weak Entity Type
  • 91. NOTATIONS USED IN E-R DIAGRAM 91 Composite Attribute Derived Attribute Multivalued Attribute
  • 92. NOTATIONS USED IN E-R DIAGRAM 92 Identifying Relationship Relationship Type
  • 93. Constraints 93 Relationship types usually have certain constraints. Two main types of relationship constraints:  Mapping cardinalities  Participation constraints
  • 94. Mapping cardinalities, or cardinality ratios94  Specifies the number of relationship instances that an entity can participate in.  For example, in the WORKS_FOR relationship type.
  • 95. Mapping Cardinalities 95  One-to-one (1:1)  One-to-many (1: N)  Many-to-one (N: 1)  Many-to-many (M: N)
  • 96. (a) One-to-one (b) One-to-many 96
  • 97. (a) Many-to-one (b) Many-to- many97
  • 98. Example of E-R Diagrams 98  Rectangles represent entity types.  Diamonds represent relationship types.  Lines link attributes to entity types and entity types to relationship types.  Ellipses represent attributes  Underline indicates primary key attributes (will study later)
  • 99. E-R Diagram With Composite, Multivalued, and Derived Attributes 99
  • 100. Relationship Types with Attributes100 we have the access_date attribute attached to the relationship set depositor to specify the most recent date on which a customer accessed that account.
  • 101. Cardinality ratio 101  We express cardinality ratio by drawing  directed line (→), signifying “one,” or an  undirected line (—), signifying “many,”
  • 103. One-To-Many Relationship 103  In the one-to-many relationship a customer is associated with several loans via borrower
  • 104. Many-To-One Relationships 104  In a many-to-one relationship a loan is associated with several customers via borrower.
  • 106. Find out the Cardinality ratio 106  Prime minister-country  classroom –students  students –classroom  customer -loan
  • 107. Participation constraints 107  Total participation : every entity in the entity type participates in at least one relationship in the relationship type  E.g. participation of loan in borrower is total  every loan must have a customer associated to it via borrower  Partial participation: some entities may not participate in any relationship in the relationship type  Example: participation of customer in borrower is partial  some customers may not participate in any loan
  • 108. KEYS 108  Key is used to identify every entity in the entity set.
  • 109. Types of keys 109  Candidate Key  Alternate & Primary key  Superkey
  • 110. Candidate Key 110  It is the minimal set of attributes that uniquely identifies any entity in entity set.  There can be more than one candidate keys in entity set.  More than one attribute can together form a single candidate key.  Suppose that a combination of customer-name and customer-street is sufficient to distinguish among members of the customer entity set.  Then, both {customer-id} and {customer-name, customer-street} are candidate keys.  Although the attributes customer-id and customer-name together can distinguish customer entities, their combination does not form a candidate key, since the attribute customer-id alone is a candidate key.
  • 111. Alternate & Primary key 111  Alternate & Primary key is related with candidate key.  In entity set, primary key is a candidate key but only one key is the primary key & the left candidate keys are called alternate key.  AK=CK-PK
  • 112. Superkey 112  A superkey is the superset of any candidate key.  For example, the customer-id attribute of the entity set customer is sufficient to distinguish one customer entity from another.  Thus, customer-id is a superkey.  Similarly, the combination of customer-name and customer-id is a superkey for the entity set customer.  The customer-name attribute of customer is not a superkey, because several people might have the same name.  Example: {customer-id}, {customer-name, customer-id}
  • 113. Weak Entity Types 113  An entity type that does not have a primary key is referred to as a weak entity type.
  • 114. Weak Entity types (Cont.) 114  We depict a weak entity type by double rectangles.  We underline the partial key of a weak entity type with a dashed line.  payment_number – partial key of the payment entity type  Primary key for payment – (loan_number, payment_number)
  • 115. Give me answer? 115  Can we convert weak entity type into strong entity type?
  • 116. PROBLEMS ON E-R DIAGRAM 116 Question: An employee works in one department. The department contains phone, the employee also has phone. Assume that an employee works in maximum 2 departments or minimum one department. Each department must have maximum 3 phones or minimum zero phone. Design an E-R diagram for the above.
  • 117. 117
  • 118. Steps in ER Modeling 118  Identify the Entities  Find relationships  Identify the key attributes for every Entity  Identify other relevant attributes  Draw complete E-R diagram with all attributes including Primary Key
  • 119. EER (Enhanced Entity-Relationship ) 119  The EER model is a high-level or conceptual data model incorporating extensions to the original Entity- relationship (ER) model.  EER includes all the concepts of ER model.  EER=ER all the concepts + some extension  Additionally it includes the concepts of  superclass and subclass  specialization and generalization.
  • 120. Subclasses and Superclasses 120  An entity type may have additional meaningful subgroupings.  Example: EMPLOYEE may be further grouped into SECRETARY, ENGINEER, MANAGER, TECHNICIAN, SALARIED_EMPLOYEE, HOURLY_EMPLOYEE,…  Each is called a subclass of EMPLOYEE  EMPLOYEE is the superclass for each of these subclasses.
  • 121. Specialization 121  Specialization is the process of defining a set of subclasses of a superclass.  The set of subclasses is based upon some characteristics of the entities in the superclass. • Attributes of a subclass are called specific attributes.  It follows top-down design process.  Represented by a triangle component labeled ISA (E.g. customer “is a” person).
  • 122. Example of Specialization 122  Consider an entity set person, with attributes name, street, and city. A person may be further classified as one of the following:  customer  employee  Each of these person types is described by a set of attributes that includes all the attributes of entity set person plus possibly additional attributes.  For example, customer entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary.  The specialization of person allows us to distinguish among persons according to whether they are employees or customers.
  • 123. Generalization 123  It is a bottom-up design process.  Generalization is a simple inversion of specialization.  In this process multiple entity sets are synthesized into a higher-level entity set on the basis of common features.  For example, customer entity set with the attributes name, street, city, and customer-id, and an employee entity set with the attributes name, street, city, employee-id, and salary.  There are similarities between the customer entity set and the employee entity set in the sense that they have several attributes in common.  This commonality can be expressed by generalization.  person is the higher-level entity set and customer and employee are lower-level entity sets.
  • 124. Continued………. 124  The person entity set is the superclass of the customer and employee subclasses.  Differences in the two approaches may be characterized by their starting point and overall goal.
  • 126. Design Constraints on a Specialization/Generalization 126  Constraint on which entities can be members of a given lower- level entity set.  Condition-defined  Example: all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person.  User-defined  Constraint on whether or not entities may belong to more than one lower-level entity set within a single generalization.  Disjoint  an entity can belong to only one lower-level entity set  Noted in E-R diagram by writing disjoint next to the ISA triangle  Overlapping  an entity can belong to more than one lower-level entity set
  • 127. Design Constraints on a Specialization/Generalization (Cont.) 127  Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization.  total : an entity must belong to one of the lower- level entity sets  partial: an entity need not belong to one of the lower-level entity sets
  • 130. E-R diagram with redundant relationships 130
  • 131. Aggregation 131  Aggregation is an abstraction through which relationships are treated as higher level entities.  Thus, for our example, we regard the relationship set works-on (relating the entity sets employee, branch, and job) as a higher-level entity set called works-on.  Such an entity set is treated in the same manner as is any other entity set.  We can then create a binary relationship manages between works-on and manager to represent who manages what tasks.
  • 133. Assignment 133 1. Construct an E-R diagram for a car-insurance company whose customers own one or more cars each. Each car has associated with it zero to any number of recorded accidents. 2. A university registrar’s office maintains data about the following entities:  Courses, including number, title, credits, syllabus, and prerequisites  Course offerings, including course number, year, semester, section number, instructor(s), timings, and classroom  Students, including student-id, name, and program  Instructors, including identification number, name, department, and title. Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be appropriately modeled. Construct an E-R diagram for the registrar’s office. Document all assumptions that you make about the mapping constraints.
  • 134. Continued….. 134 3. Design an E-R diagram for keeping track of the exploits of your favorite sports team. You should store the matches played, the scores in each match, the players in each match and individual player statistics for each match. Summary statistics should be modeled as derived attributes. 4. Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors. Associate with each patient a log of the various tests and examinations conducted.
  • 135. Continued….. 135 5. Consider a university database for the scheduling of classrooms for final exams. This database could be modeled as the single entity set exam, with attributes course- name, section-number, room-number, and time. Alternatively, one or more additional entity sets could be defined, along with relationship sets to replace some of the attributes of the exam entity set, as course with attributes name, department, and c-number section with attributes s-number and enrollment, and dependent as a weak entity set on course room with attributes r-number, capacity, and building (a) Show an E-R diagram illustrating the use of all three additional entity sets listed. (b) Explain what application characteristics would influence a decision to include or not to include each of the additional entity sets. 6. Construct an E-R diagram for a Bank.
  • 137. Storage hierarchy includes two main categories:137  Primary storage (main memory, cache memory)  Secondary storage (Magnetic disks, Magnetic tapes and optical disks)
  • 138. Buffer Manager 138  Files reside permanently on disks.  Each file is partitioned into fixed-length storage units called blocks.  The buffer is the part of main memory available for storage of copies of disk blocks.  The subsystem responsible for the allocation of buffer space is called the buffer manager.
  • 139. Buffer Manager techniques 139 • Buffer replacement strategy: When there is no room left in the buffer, a block must be removed from the buffer. Most operating systems use a least recently used (LRU) scheme. • Pinned blocks: Most recovery systems require that a block should not be written to disk while an update on the block is in progress. A block that is not allowed to be written back to disk is said to be pinned. • Forced output of blocks: There are situations in which it is necessary to write back the block to disk, even though the buffer space that it occupies is not needed. This write is called the forced output of a block.
  • 140. Record Structure 140  The database is stored as a collection of files.  Each file is a sequence of records.  A record is a sequence of fields. Types of records  Fixed-Length Records: every record in the file has exactly the same size (in bytes).  Variable-Length Records: different records in the file have different sizes.
  • 141. Fixed-Length Records 141 Let us consider a file of account records for bank database. Each record of this file is defined as: Account-number: char (10); Branch-name: char (22); Balance: real; //Real size=8 Record size= 10+22+8= 40 bytes A simple approach is to use the first 40 bytes for the first record, the next 40 bytes for the second record, and so on.
  • 142. 142 There are two problems with this simple approach: 1. It is difficult to delete a record from this structure. The space occupied by the record to be deleted must be filled with some other record of the file. 2. Unless the block size happens to be a multiple of 40, some records will cross block boundaries. It would thus require two block accesses to read or write such a record.
  • 143. Deletion of record 1st approach 143  When a record is deleted, we could move the record that came after it into the space occupied by the deleted record, and so on, until every record following the deleted record has been moved ahead. Such an approach requires moving a large number of records.
  • 144. Deletion of record 2nd approach 144  It might be easier simply to move the final record of the file into the space occupied by the deleted record. It is undesirable to move records to occupy the space freed by a deleted record, since doing so requires additional block accesses.
  • 145. Deletion of record 3rd approach145  Since insertions tend to be more frequent than deletions, it is acceptable to leave open the space occupied by the deleted record, and to wait for a subsequent insertion before reusing the space.
  • 146. Variable-Length Records 146  Variable-length records arise in database systems in several ways:  Storage of multiple record types in a file.  Record types that allow variable lengths for one or more fields.  Record types that allow repeating fields (used in some older data models).
  • 147. Techniques for implementing variable-length records147  Byte-String Representation  Fixed-Length Representation
  • 148. Byte-String Representation 148  A simple method for implementing variable-length records is to attach a special end-of-record (⊥) symbol to the end of each record.
  • 149. Byte-string representation disadvantages: 149  It is not easy to reuse space occupied formerly by a deleted record.  There is no space, in general, for records to grow longer.
  • 150. Slotted-page structure 150  A modified form of the byte-string representation, called the slotted-page structure, is commonly used for organizing records within a single block.
  • 151. 151  There is a header at the beginning of each block, containing the following information: 1. The number of record entries in the header 2. The end of free space in the block 3. An array whose entries contain the location and size of each record  The actual records are allocated contiguously in the block, starting from the end of the block.  The free space in the block is contiguous, between the final entry in the header array, and the first record.  If a record is inserted, space is allocated for it at the end of free space, and an entry containing its size and location is added to the header.  If a record is deleted, the space that it occupies is freed, and its entry is set to deleted.
  • 152. Fixed-Length Representation 152  Another way to implement variable-length records efficiently in a file system is to use one or more fixed-length records to represent one variable- length record. There are two ways of doing this:  1. Reserved space: If there is a maximum record length that is never exceeded, we can use fixed- length records of that length. Unused space (for records shorter than the maximum space) is filled with a special null, or end-of-record, symbol.  2. List representation: We can represent variable-length records by lists of fixed length records, chained together by pointers.
  • 153. File organization 153  File organization includes the way records and blocks are placed on the storage medium.  There are two types of file organization  Primary File Organizations  Secondary File Organizations
  • 154. Primary File Organizations 154  Unordered or Heap or Pile Files  Ordered or Sorted or sequential Files  Hash or Direct Files
  • 155. Unordered or Heap or Pile Files155  Records are placed in the file in the order in which they are inserted.  Inserting a new record is very efficient.  Searching can be done by linear search (inefficient).  Deletion is very inefficient.
  • 156. Ordered or Sorted or sequential Files156  It store records in sequential order, based on the value of the search key of each record.  An attribute or set of attribute used to look up records in a file is called a search key.
  • 157. Advantages of Ordered Files 157  Reading of the records in order of the ordering field is extremely efficient, because no sorting is required.  Finding the next record is fast.
  • 158. Disadvantages of Ordered Files 158  Searches on non-ordering fields are inefficient.  Insertion and deletion of records are very expensive.
  • 159. Hash or Direct Files 159  Hash function computed on some attribute of each record; the result specifies where record should be placed.
  • 160. Secondary File Organizations 160  Secondary file organization uses the index to access the records.  An index for a file in a database system works in the same way as the index in any textbook.  If we want to learn about a particular topic (specified by a word or a phrase) , we can search for the topic in the index at the back of the book.  Indexes provide faster access to data.
  • 161. Types of Indexes 161 • Single-level ordered indexes • Primary indexes • Secondary indexes • Clustering indexes • Multi-level Indexes • Dynamic Multi-level indexes using B-trees and B+- trees
  • 162. Primary indexes162  A Primary Index is constructed of two parts: The first field is the same data type of the primary key of a file block of the data file and the second field is file block pointer.
  • 163. Indexes can also be characterized as163  Dense: A dense index has an index entry for every search key value (and hence every record) in the data file.  Sparse (nondense): A sparse (or nondense) index, on the other hand, has index entries for only some of the search values.  A primary index is hence a nondense (sparse) index, since it includes an entry for each disk block of the data file rather than for every search value (or every record).
  • 164. Problem with a primary index 164  A major problem with a primary index—as with any ordered file—is insertion and deletion of records.
  • 165. Clustering Indexes 165  If records of a file are physically ordered on a nonkey field—which does not have a distinct value for each record—that field is called the clustering field.  A clustering index is also an ordered file with two fields; the first field is of the same type as the clustering field of the data file, and the second field is a block pointer.
  • 166. 166
  • 167. Secondary Indexes 167  A Secondary Index is an ordered file with two fields.  The first is of the same data type as some nonordering field and the second is either a block or a record pointer.  If the entries in this nonordering field must be unique this field is sometime referred to as a Secondary Key. This results in a dense index.
  • 168. 168
  • 170. Multilevel Indexes 170  A Multilevel Index is where you construct an Second- Level index on a First-Level Index. Continue this process until the entire index can be contained in a Single File Block.
  • 171. 171
  • 172. Dynamic Multilevel Indexes Using B-Trees and B+-Trees172  B-trees and B+-trees are special cases of the well-known tree data structure.  A tree is formed of nodes.  Each node in the tree, except for a special node called the root, has one parent node and several—zero or more— child nodes.  The root node has no parent. A node that does not have any child nodes is called a leaf node; a nonleaf node is called an internal node.  The level of a node is always one more than the level of its parent, with the level of the root node being zero.  A subtree of a node consists of that node and all its descendant nodes—its child nodes, the child nodes of its child nodes, and so on.
  • 173. B tree 173  A B-tree of order m (the maximum number of children for each node) is a tree which satisfies the following properties:  Every node has at most m children.  Every node (except root and leaves) has at least m⁄2 children.  The root has at least two children if it is not a leaf node.  All leaves appear in the same level, and carry information.  A non-leaf node with k children contains k–1 keys.
  • 174. Structure of B tree 174
  • 175. 175 B tree with order 3
  • 176. Insertion algorithm 176  All insertions start at a leaf node. To insert a new element Search the tree to find the leaf node where the new element should be added.  Insert the new element into that node with the following steps: 1. If the node contains fewer than the maximum legal number of elements, then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered. 2. Otherwise the node is full, so evenly split it into two nodes.  A single median is chosen from among the leaf's elements and the new element.  Values less than the median are put in the new left node and values greater than the median are put in the new right node, with the median acting as a separation value.  Insert the separation value in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree).
  • 177. A B Tree insertion example with each iteration177
  • 178. B+ tree 178  Properties of a B+ Tree of order m :  All internal nodes (except root) has at least v keys and at most 2m keys .  The root has at least 2 children unless it’s a leaf..  All leaves are on the same level.  An internal node with k keys has k+1 children
  • 179. Inserting a Data Entry into a B+ Tree: Summary179  Find correct leaf L.  Put data entry onto L.  If L has enough space, done!  Else, must split L (into L and a new node L2)  Redistribute entries evenly, put middle key in L2  copy up middle key.  Insert index entry pointing to L2 into parent of L.  This can happen recursively  To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)  Splits “grow” tree; root split increases height.  Tree growth: gets wider or one level taller at top.
  • 180. Inserting 16*, 8* into Example B+ tree 180 Root 17 24 3013 2* 3* 5* 7* 8* 2* 5* 7*3* 17 24 3013 8* You overflow One new child (leaf node) generated; must add one more pointer to its parent, thus one more key value as well. 14* 15* 16*
  • 181. Inserting 8* (cont.)  Copy up the middle value (leaf split) 181 2* 3* 5* 7* 8* 5 Entry to be inserted in parent node. (Note that 5 is continues to appear in the leaf.) s copied up and 13 17 24 30 You overflow!5 13 17 24 30
  • 182. 182 (Note that 17 is pushed up and only appears once in the index. Contrast Entry to be inserted in parent node. this with a leaf split.) 5 24 30 17 13 Insertion into B+ tree (cont.) 5 13 17 24 30 • Understand difference between copy- up and push-up • Observe how minimum occupancy is guaranteed in both leaf and index pg splits. We split this node, redistribute entries evenly, and push up middle key. 
  • 183. Example B+ Tree After Inserting 8* 183 Notice that root was split, leading to increase in height. 2* 3* Root 17 24 30 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 135 7*5* 8*