Good PPT for RDBMS starter

  • 274 views
Uploaded on

here the ppt will shows how to work with sql and with related quiries

here the ppt will shows how to work with sql and with related quiries

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
274
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
14
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Table of Contents
    1. Introduction to Database Management Systems (DBMS) (Page : 3-16)
    1.1 Database Management System: Definitions
    1.2 DBMS
    1.3 Benefits of database approach
    1.4 DBMS functions
    1.5 Database System
    1.6 Data Model
    1.7 Database Architecture
    1.8 An Example of the Three Levels
    1.9 Schema
    1.10 Data Independence
    1.11 Types Of Database Models
    1.12 Database Design Phases
    2. Introduction to RDBMS (Page : 17-24 )
    2.1 Definition: RDBMS
    2.2 Features Of an RDBMS
    2.3 Some Important Terms
    2.4 Properties of Relations
    2.5 Keys
    2.6 Referential Integrity
    2.10 Summary
    3. Relational Algebra(Page : 25-36)
    3.1 Relational Query Languages
    3.2 Example Instances
    3.3 Relational Algebra
    3.4 Projection
    3.5 Selection
    3.6 Union, Intersection, Set Difference
    3.7 Cross Product
    3.8 Joins
    3.9 Equi-Joins
    3.10 Division
    3.11 Summary
    4. Introduction to Query Optimization(Page : 37-43)
    4.1 Processing a high-level query
    4.2 Techniques for Query Optimization
    4.3 Motivating Examples
    4.2 Summary
  • 5. Conceptual Design Using The Entity-Relational Model (Page : 44-69)
    5.1 Overview Of Database Design
    5.2 E-R Modeling
    5.3 Graphical Representaion
    5.4 Types Of Relationships
    5.5 E-R Diagram: Some Examples
    5.6 Summary and Case Studies
    6. Schema Refinement and Normalization (Page : 70-95)
    6.1 Normalization and Normal Forms
    6.2 Why Normal Forms
    6.3 The Evils Of Redundancy
    6.4 Refining an ER Diagram
    6.5 First Normal Form
    6.6 Functional Dependencies
    6.7 Example: Constraints On Entity Set
    6.8 Second Normal Form
    6.9 Transitive Dependency
    6.10 Third Normal Form
    6.11 Boyce Codd Normal Form (BCNF)
    6.12 Decomposition of a Relation Scheme
    6.13 Lossless Join Decompositions
    6.14 Summary and Examples
    7. Transaction, Concurrency Control and Recovery(Page : 96-116)
    7.1 Transactions
    7.2 The ACID Properties
    7.3 Why Have Concurrent Processes?
    7.4 Schedules
    7.5 Serializable Schedules
    7.6 Serializability Violations
    7.7 Cascading Aborts
    7.8 Recoverable Schedules
    7.9 Locking: A Technique For Concurrency Control
    7.10 Two-Phase Locking
    7.11 Handling A Lock Request
    7.11 Recovery
    7.12 Logging
    7.13 Handling the Buffer Pool
    7.14 Write Alead Logging
    7.15 Checkpoints in the System Log
    7.16 Summary
    Bibliographic Reference : Page 117)
  • Topics Covered :
    Database Management System: Definitions
    DBMS
    Benefits of database approach
    DBMS functions
    Database System
    Data Model
    Database Architecture
    An Example of the Three Levels
    Schema
    Data Independence
    Types Of Database Models
    Database Design Phases
  • Modern day Computer-based Information Systems (IS) are capable of serving a variety of complex tasks in a coordinated manner. Such systems handle large volumes of data, multiple users and several applications for activities occurring in a central and/ or distributed environment.
    The heart of an IS is Database Management. This is because most IS have to handle massive amounts of data. This core module of an IS is called as Database Management System (DBMS). A DBMS provides for storage, retrieval and updation of data in an organized manner.
    An Example: Consider the situation in a library. Here, we have data corresponding to books, authors, suppliers, borrowers, etc. The total volume of data stored and handled in a library may be quite large. The Library DBMS may require several operations such as issue, return or purchase of books; handle queries relating to book information, borrowing information, etc. Moreover, there are different types of users who operate various stages or activities. For example, a borrower may merely view certain information, whereas an issuer may be allowed to update the status of a book during issue or return. The Library staff may on the other hand add new books, their supplier, price and other information to the database. Each user category has a different access right on both the data as well as the processing capabilities. Multiple users may concurrently operate the Library DBMS performing several tasks at the same time. They may even try to access the same data simultaneously. It is the job of a DBMS to handle the data and its processing in an integrated, coordinated and consistent manner. Finally, the Library DBMS must have mechanisms to handle system failure (e.g., failure of power, disk crash, etc.) so that the database can be recovered to a consistent state.
  • A database management system (DBMS) is a collection of programs that facilitates the process of defining, constructing and manipulating databases.
    Defining a database involves specifying the types of data to be stored in the database.
    Constructing the database is the process of storing the data.
    Manipulating a database includes querying the database, updating the database and generating reports from the data.
    A DBMS does the following:
    Adding new, empty files to the database
    Inserting new data into existing files
    Retrieving data from existing files
    Updating data in existing files
    Deleting data from existing files
    Removing existing files, empty or otherwise, from the database
  • DBMS Functions :
    Data Definition
    Data Manipulation
    Data Security and Integrity
    Data Recovery and Concurrency
    Data Dictionary
    Performance
  • A database management system is a complex piece of software that usually consists of a number of modules. The DBMS may be considered as an agent that allows communication between the various types of users with the physical database and the operating system without the users being aware of every detail of how it is done. To enable the DBMS to fulfil its tasks, the database management system must maintain information about the data itself that is stored in the system. This information would normally include what data is stored, how it is stored, who has access to what parts of it and so on.
    The information (data) about the data in a database is called the metadata. In addition to information listed above, some information regarding the use of a database is often collected to monitor the system's performance. This metadata helps management in maintaining an effective and efficient database system.
    Three broad classes of users
    Application programmers: Responsible for writing application programs that use the database
    End users: Interact with the system from workstations or terminals. A given end user can access the database via one of the applications, or can use an interface provided as an integral part of the database system software (such interfaces are also supported by means of applications, of course, but those applications are built-in, not user-written, e.g., query language processor)
    Database Administrator (DBA): Creates the actual database and implements technical controls needed to enforce various policy decisions. The DBA is also responsible for ensuring that the system operates with adequate performance and for providing a variety of other related technical services
  • One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. A data model is the main tool for providing this abstraction. A data model is a set of concepts that can be used to describe the structure of a database. It is a collection of high-level data description constructs that hide many low-level storage details.
    Categories of Data Models :
    Many data models have been proposed. We can categorize data models based on the types of concepts they provide to describe the data structure.
    High Level or conceptual data models: provide concepts that are close to the way many users perceive data. Use concepts such as entities, attributes, and relationships, where Entity represents a real world object (e.g., student, employee) or concepts (e.g., course, company), Attribute represents properties that describes objects (e.g., color, name) while Relationships represent an interaction or links among entities (e.g., works-on, is-a, has, etc.)
    Low-level or physical data models: provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. Represent information such as record formats, record orderings, and access paths (structure that makes the search for particular database records efficient i.e. indexing)
    Representational or implementation: Between above two extremes is a class of representational (or implementation) data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Representational data models hide some details of data storage but can be implemented on a computer system in a direct way.
  • Three important characteristics of the database approach are
    (a) Insulation of programs and data (program-data and program-operation independence).
    (b) Support of multiple user views.
    (c) Use of a catalog to store database description.
    The three schema architecture was proposed to achieve these characteristic.
    The Three levels of architecture :
    The goal of the three schema architecture is to separate the user applications and the physical database.
    The internal level is the one closest to the physical storage, i.e., it is the one concerned with the way data is physically stored
    The external level is the one closest to the user, i.e., it is the one concerned with the way data is viewed by individual users
    The conceptual level is a level of indirection between the other two
    There will be many distinct external views, each consisting of a more or less abstract representation of some portion of the total database, and there will be one conceptual view, consisting of a similarly abstract representation of the database in its entirety. Likewise there will be precisely one internal view, representing the total Database as physically stored.
  • Mappings
    The conceptual/internal mapping : defines the correspondence between the conceptual view and the stored database; it specifies how conceptual records and fields are represented at the internal level
    The external/conceptual mapping : defines the correspondence between a particular external view and the conceptual view
  • A description of data in terms of a data model is called a schema. The description of a database is called database schema, which is specified during database design and is not expected to change frequently.
    The Internal View/ Schema :
    The internal view (or stored database) is a low-level representation of the entire database. The internal view is defined by the internal schema, which defines the various stored record types and specified what indexes exist, how stored fields are represented and what physical sequence the stored records are in, etc.
    The Conceptual View / Schema :
    The conceptual view is a representation of the entire content of the database, in a form that is somewhat abstract in comparison with the way in which the data is physically stored.
    The conceptual view is defined by means of the conceptual schema, which includes definitions of each of the various conceptual record types.
    The External View / Schema :
    Each external view is defined by means of an external schema.
    External schema consists of definitions of each of the various external record types in that external view.
    There must be a definition of the mapping between the external schema and the underlying conceptual schema.
  • The three level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence.
    Physical data independence: The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence.
    Logical data independence: Separating the external views from the conceptual view enables us to change the conceptual view without affecting the external views. This separation is sometimes called logical data independence.
    Functions of the DBA (Database Administrator):
    Defining the conceptual schema -- conceptual database design
    Defining the internal schema -- physical database design and define the associated mapping between the internal and conceptual schemas
    Liaison with users
    Defining security and integrity rules
    Defining backup and recovery procedures
    Monitoring performance and responding to changing requirements
  • The most well-known record-based models are the relational model, the network model and the hierarchical model.
    Relational model: In this model, each database item is viewed as a record with attributes. A set of records with similar attributes is called a table. Most of the popular commercial DBMS products like Oracle, Sybase, MySQL, etc. are based on relational model.
    Network model: represents data as record types. However, unlike the relational model, here we have explicit linkages (expressed in the form of pointers) which relate various records. Each record has a link field corresponding to every relationship which it participates in. IDS (Integrated Data Store) is one of the DBMS product based on network models.
    Hierarchical Model: represents data as hierarchical tree. This is a special kind of a network model in which the relationship is essentially a tree-like structure, where one parent may have many children but one child can not have more than one parent. The relationship borrower to books in a library system satisfies this condition. One of the popular DBMS based on hierarchical model is Information Management System (IMS) from IBM.
    Object Oriented model: represents DB in terms of objects, their attributes, and their behaviors.
  • THE FOUR PHASES TO DESIGN ANY DATA BASE SYSTEM ARE:
    1. FORMULATION OF INFORMATION REQUIREMENT & ANALYSIS PHASE: This phase is also called Feasibility phase. In this phase, through the interviews and reviewing all related documents and policies in the organization, the following items are identified:
    a. Clear and concise definition of the problem
    b. Local dependency lists
    c. local dependency diagrams
    d. Local Schema
    2. LOGICAL SCHEMA DESIGN PHASE:
    In this phase the following items are performed:
    a. Consolidation of dependency lists.
    b. Consolidation of logical schema.
    The output of this phase is a logical schema that is independent of all computer hardware and software systems.
    3. IMPLEMENTATION DESIGN PHASE:
    In this phase the logical schema which was designed in the Logical Design Phase is modified to fit the specific data model, hardware and software system that the designer wants to use. This new schema is called IMPLEMENTATION SCHEMA.
    4. PHYSICAL DESIGN PHASE:
    In this phase the Implementation Schema which was designed in the Implementation Phase is programmed using the DDL (Data Definition Language) or any other software language which is available for the programmer.
  • Topics Covered :
    Definition: RDBMS
    Features of an RDBMS
    Some Important Terms
    Properties of Relations
    Keys
    Referential Integrity
    Summary
  • Domain :
    An attribute of an entity set has a particular value. The set of possible values that a given attribute can have is called its domain.
    For example, the set of values that the attribute EMPLOYEE.id can assume is a positive integer of 5 digits.
    Primary Key :
    A unique identifier for the table (a column or a column combination with the property that at any given time no two rows of the table contain the same value in that column or column combination)
  • Key: An attribute or set of attributes whose values uniquely identify each entity in an entity set is called a key for that entity set.
    Super Key: If we add additional attributes to a key, the resulting combination would still uniquely identify an instance of the entity set. Such augmented keys are called super keys.
    Primary key: It is a minimum super key.
    Candidate Keys : There may be two or more attributes or combinations of attributes that uniquely identify an instance of an entity set.These attributes or combinations of attributes are called candidate keys.
    In such a case we must decide which of the candidate keys will be used as the primary key. The remaining candidate keys would be considered alternate keys.
    Secondary Key: A secondary key is an attribute or combination of attributes that may not be a candidate key but that classifies the entity set on a particular characteristic.
    A case in point is the entity set EMPLOYEE having the attribute department, which identifies by its value all instances EMPLOYEE who belong to a given department.
    Any key consisting of a single attribute is called a simple key while that consisting of a combination of attributes is called a composite key.
  • A set of fields is a key for a relation if :
    1. No two distinct tuples can have same values in all key fields, and
    2. This is not true for any subset of the key.
    If there’s >1 key for a relation, one of the keys is chosen (by DBA) to be the primary key . Eg. sid is a key for Students. (What about name ?) The set {sid, gpa} is a superkey.
    Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key .
    Foreign key: Set of fields in one relation that is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’.
    Eg. sid is a foreign key referring to Students:
    – Enrolled (sid: string, cid: string, grade: string)– If all foreign key constraints are enforced, referential integrity is achieved, ie., no dangling references.
    Enforcing Referential Integrity
    Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students. What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)
    What should be done if a Students tuple is deleted?
    – Also delete all Enrolled tuples that refer to it.
    – Disallow deletion of a Students tuple that is referred to.
    – Set sid in Enrolled tuples that refer to it to a default sid
    – (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’) Similar if primary key of Students tuple is updated.
  • Summary :
    A tabular representation of data.
    Simple and intuitive, currently the most widely used.
    Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.
    – Two important Integrity Constraints: primary and foreign keys
    – In addition, we always have domain constraints.
    Powerful and natural query languages exist.
  • Topics Covered :
    Database Design
    E-R Modeling
    Example E-R Diagrams
    Summary
    Case Studies
  • The database design can be divided into following steps:
    Requirement Analysis: First of all, we should be clear about what the users want from database, what data to be stored, and operations to be performed.
    Conceptual Design: The information gathered in the requirements analysis step is used to develop a high level description of the data to be stored in the database. In this step we have to address the following:
    -What are the entities and relationships in the enterprise?
    -What information about these entities and relationships should we store in the database?
    -What are the integrity constraints or business rules that hold?
    This step is often carried out using the ER model, or a similar high-level model. A database `schema’ in the ER Model can be represented pictorially ( ER diagrams ).
    Logical Database Design: We must choose a DBMS to implement our database design, and convert the conceptual database design into a database schema in the data model of the chosen DBMS. For example, we can map an ER diagram into a relational database schema.
    Schema Refinement (Normalization): Check relational schema for redundancies and related anomalies.
    Physical Database Design and Tuning : Consider typical workloads and further refine the database design.
  • The Basic Design Phases is divided into different Phases:1. Requirement Collection & Analysis :
    - The Database Designers Interview Prospective Database users to understand andDocument their Data requirements. The result of this step is concisely written set of users requirements. This concept of user defined operations that will be applied to the database and they include both retrievals and updates in soft ware design.
    2. Conceptual Design :It is a concise description of the data requirements of the users and include detailed descriptions of the entity types , relationships and constraints and they are expressed using
    The concepts provided by the high level data model.
    3. Logical Design :
    Identification of Data Model Mapping is done here. RDBMS / DBMS / Object Model
    4. Physical Design :
    The Internal storage structures / access paths and file organizations for the database files are specified. These Activities and application programs are designed and implemented as database transactions corresponding to the high level specifications.
  • Entity :
    An Entity is a thing that exists and is distinguishable.
    For example, each chair is an entity. So is each person and each automobile.
    Entities can have concrete existence or constitute ideas or concepts.
    Concepts like love and hate are entities.
    Entity Set :
    A group of similar entities forms an entity set.
    Examples of entity sets are:
    1. All Persons
    2. All Automobiles
    3. All Emotions
    Attributes :
    Attributes are the properties that characterize an entity set.
    For Example, employees of an organization are modeled by the entity set EMPLOYEE. We must include in the model the properties of the employees that may be useful to the organization. Some of these properties are name, address, skill etc.
    Relationship: It is an association between two or more entities.
    For example, we may have the relationship that an employee works in a department.
  • There are two types of entities: regular and weak.
    A regular (independent) entity does not depend on any other entity for its existence. For example, Employee is a regular entity. A regular entity is depicted using a rectangle.
    An entity whose existence depends on the existence of another entity is called a weak (or dependent) entity. For example, the dependent of an employee is a weak entity, whose existence depends on the entity Employee. A dependent entity is depicted in a double-lined box, or a darkened rectangle.
    Similarly, relationships can also be regular or weak.
  • Entity: Real- world object distinguishable from other objects. It could be an object, place, person, concept or activity about which an enterprise records data. To qualify something as an entity, it should
    – Have an independent existence
    – Be of interest to us.
    An entity is described (in DB) using a set of attributes .
    Entity Set : A collection of similar entities. Eg., all employees.
    – All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!)
    – Each entity set has a key .
    – Each attribute has a domain .
    – Can map entity set to a relation easily
  • A relationship is defined as an association among entities. For example, there is a relationship between students and course, which can be named as ‘enrols in’.
    A relationship set is an association of entity sets (eg. student- course) while a relationship instance is an association of entity instances (eg. Ravi- DBMS).
    An n- ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1 E1, ..., en En
    Same entity set could participate in different relationship sets, or in different “roles” in same set.
    A relationship is depicted by a diamond, with the name of the relationship type.
    There are three types of relationships:
    - One-to-one: One student is issued only one card (and vice-versa).
    - One-to-many (or many-to-one): One Student can enrol for only one course, but one course can be offered to many students.
    - Many-to-many: One Student can take many tests, and one test can be taken by many Students.
  • In above figure, we show the relationship set Works_in, in which each relationship indicates a department in which an employee works.
    The entities are described by a set of attributes and identified by primary keys (PK).
    Employee:
    Attributes ssn, name, lot
    PK: ssn
    Department:
    Attributes: did, dname, budget
    PK: did
    The entity sets that participate in a relationship set need not be distinct; sometimes a relationship might involve two entities in the same entity set. For example, in Reports_To relationship set, every relationship is of the form (emp1, emp2).
    An instance of a relationship set is a set of relationships. Intuitively, an instance can be thought of as a ‘snapshot’ of the relationship set at some instance in time.
  • Relationship sets can also have descriptive attributes (e. g., the since attribute of Works_ In).
    A relationship must be uniquely identified by the participating entities, without reference to the descriptive attributes. In the Works_in relationship set, for example, each Works_in relationship must be uniquely identified by the combination of employee ssn and department did. Thus, for a given employee-department pair, we cannot have more than one associated since value.
    Thus, in translating a relationship set to a relation, attributes of the relation must include:
    Keys for each participating entity set (as foreign keys). This set of attributes forms superkey for the relation.
    All descriptive attributes.
  • A key constraint between an entity set S and a relationship set restricts instances of the relationship set by requiring that each entity of S participate in at most one relationship.
    Consider Manages: Each dept has at most one manager, according to the key constraint on ‘Manages’ relationship (In contrast, Works_In relationship of earlier slide shows that an employee can work in many departments and a dept can have many employees). The arrow from Department to Manages indicates that each Department entity appears in at most one Manages relationship in any allowable instance of Manages. Thus given a Department entity, we can uniquely determine the Manages relationship in which it appears.
    Translating ER Diagrams with Key Constraints:
    Map relationship to a table: Note that did is the key now!
    – Separate tables for Employees and Departments.
    Since each department has a unique manager, we could instead combine Manages and Departments.
    Manages table without Key constraint:
    CREATE TABLE Manages(
    ssn CHAR( 11),
    did INTEGER,
    since DATE,
    PRIMARY KEY (did),
    FOREIGN KEY (ssn)
    REFERENCES Employees,
    FOREIGN KEY (did)
    REFERENCES Departments)
  • Ternary Relationship: A relationship set involving three entity sets is known as a ternary Relationship.
    Eg. Works_in relationship involving Employee, Department and Location Entity sets.
    In above slide, we show a ternary relationship with a key constraint. The key constraint indicates that each employee works in at most one department, and at a single location. Notice that each department can be associated with several employees and locations, and each location can be associated with several departments and employees; however, each employee is associated with a single department, and location.
  • The key constraint on Manages tells us that a Department has at most one Manager (indicated by arrow). Let us now ask: Does every department have a manager? If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial ). The total participation is indicated by a dark line between entity and relationship. A participation that is not total is said to be partial. Eg. participation of Employee in Manages is partial.
    The participation constraint specifies whether the existence of an entity depends on its being related to another entity via the relationship type.
    A participation constraint between an entity set S and a relationship set restricts instances of the relationship set by requiring that each entity of S participate in at least one relationship.
    Every did value in Department table must appear in a row of the Manages table (with a non- null ssn value!).
    Similarly, every ssn value in Employee table must appear in a row of the Works_in table.
    Participation Constraints in SQL: We can capture participation constraints involving one entity set in a binary relationship, but little else (without resorting to CHECK constraints).
    CREATE TABLE Dept_ Mgr(
    did INTEGER,
    dname CHAR( 20),
    budget REAL,
    ssn CHAR( 11) NOT NULL,
    since DATE,
    PRIMARY KEY (did),
    FOREIGN KEY (ssn) REFERENCES Employees,
    ON DELETE NO ACTION )
  • A weak entity’s existence is dependent on another (owner) entity. Hence a weak entity will not have it’s own key. It can be identified uniquely only by considering the primary key of it’s owner entity.
    – Owner entity set and weak entity set must participate in a one-to-many relationship set (1 owner, many weak entities).
    – Weak entity set must have total participation in this identifying relationship set.
    Translating Weak Entity Sets:
    Weak entity set and identifying relationship set are translated into a single table.
    – When the owner entity is deleted, all owned weak entities must also be deleted.
    Eg. If the employee quits, any policy owned by the employee is terminated. All the relevant policy and dependent information is also deleted from the database.
    To indicate that Dependent is a weak entity and policy is its identifying relationship, we draw both with dark lines.
    CREATE TABLE Dep_ Policy (
    pname CHAR( 20),
    age INTEGER,
    cost REAL,
    ssn CHAR( 11) NOT NULL,
    PRIMARY KEY (pname, ssn),
    FOREIGN KEY (ssn) REFERENCES Employees,
    ON DELETE CASCADE )
  • As in C++, or other Programming Languages, attributes are inherited.
    If we declare A ISA B, every A entity is also considered to be a B.
    entity. (Query answers should reflect this: unlike C++!)
    Overlap constraints : Can Joe be an Hourly_ Emp as well as a Contract_ Emp entity? ( Allowed/ disallowed )
    Covering constraints : Does every Employee entity also have to be an Hourly_ Emp or a Contract_ Emp entity? (Yes/ no)
    Reasons for using ISA :
    – To add descriptive attributes specific to a subclass .
    – To identify entities that participate in a relationship
    Translating ISA Hierarchies to Relations:
    General approach:
    – 3 relations: Employee, Hourly_ Emp and Contract_ Emp.
    Hourly_ Emp : Every employee is recorded in Employee.
    For hourly emps, extra info recorded in
    Hourly_ Emp ( hourly_ wages, hours_ worked, ssn) ;
    must delete Hourly_ Emps tuple if referenced Employees tuple is deleted).
    Queries involving all employees easy, those involving just Hourly_ Emp require a join to get some attributes.
    Alternative: Just Hourly_ Emp and Contract_ Emp.
    – Hourly_ Emp : ssn, name, lot, hourly_ wages, hours_ worked.
    – Contract_ Emp : ssn, name, lot, contractid.
    – Each employee must be in one of these two subclasses
  • Aggregation
    Aggregation is meant to represent a relationship between a whole object and its component parts.
    Used when we have to model a relationship involving (entitity sets and) a relationship set .
    – Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships.
    – Eg. A Project is sponsored by a Department. This is a simple relationship.
    An Employee monitors this Sponsorship (and not Project or Department). This is aggregation.
    – Monitors mapped to table like any other relationship set.
    Aggregation vs. ternary relationship:
    Can we express relationships involving other relationships without using aggregation?
    – The use of aggregation vs. ternary relationship may be guided by certain integrity constraints.
    – Eg. we can impose a constraint that each sponsorship is monitored by at most one employee (not possible without aggregation).
  • Conceptual Design Using the ER Model
    Design choices:
    – Should a concept be modelled as an entity or an attribute?
    – Should a concept be modelled as an entity or a relationship?
    – Identifying relationships: Binary or ternary? Aggregation?
    Entity vs. Attribute
    Should address be an attribute of Employees or an entity (connected to Employees by a relationship)?
    Depends upon the use we want to make of address information, and the semantics of the data:
    If we have several addresses per employee, address must be an entity (since attributes cannot be set- valued).
    If the structure (city, street, etc.) is important, e. g., we want to retrieve employees in a given city, address must be modelled as an entity (since attribute values are atomic).
    Otherwise, address can be used as an attribute of Employee.
  • Similar to the problem of wanting to record several addresses for an employee: we want to record several values of the descriptive attributes for each instance of this relationship.
    Consider that an employee works in a given department over more than one period. This possibility is ruled out by the ER diagram’s semantics of previous slide. The problem is that we want to record several values for descriptive attributes for each instance of Works_in relationship. We can address this problem by introducing an entity set called Duration, with attributes from and to.
  • ER diagram OK if a manager gets a separate discretionary
    budget for each dept.
    What if a manager gets a discretionary budget that covers all managed depts?
    – Redundancy of dbudget, which is stored for each dept managed by the manager.
    – Misleading: suggests dbudget tied to managed dept.
  • One of the possible designs to resolve the two issues of the previous ER diagram:
    We model the appointment as an entity set, say Mgr_appt, and use a ternary relationship, say manages, to relate a manager, an appointment, and a department. The dbudget is now associated with the appointment of the employee as manager of a group of departments. The details of an appointment (such as the discretionary budget) are not repeated for each department that is included in the appointment now, although there is still one Manages relationship instance per such Department.
  • Above figure models a situation in which an employee can own several policies, each policy can be owned by several employees, and each dependent can be covered by several policies.
    Suppose we have following constraint:
    Each policy is owned by just 1 employee
    – Key constraint on Policy would mean policy can only cover 1 dependent!
  • The key constraints allow us to combine Purchaser with Policy and Beneficiary with Dependent.
    Participation constraints lead to NOT NULL constraints.
    CREATE TABLE Policy (
    policyid INTEGER,
    cost REAL,
    ssn CHAR( 11) NOT NULL,
    PRIMARY KEY (policyid),
    FOREIGN KEY (ssn) REFERENCES Employee,
    ON DELETE CASCADE )
    CREATE TABLE Dependent (
    pname CHAR( 20),
    age INTEGER,
    policyid INTEGER,
    PRIMARY KEY (pname, policyid),
    FOREIGN KEY (policyid) REFERENCES Policy,
    ON DELETE CASCADE )
  • Constraints in the ER Model:
    – A lot of data semantics can (and should) be captured.
    – But some constraints cannot be captured in ER diagrams.
    Need for further refining the schema:
    – Relational schema obtained from ER diagram is a good first step. But ER design subjective & can’t express certain constraints; so this relational schema may need refinement.
    Functional dependencies:
    – e. g., A dept can’t order two distinct parts from the same supplier .
    Can’t express this wrt ternary Contracts relationship.
    – Normalization refines ER design by considering FDs.
    Inclusion dependencies:
    – Special case: Foreign keys (ER model can express these).
    – e. g., At least 1 person must report to each manager. (Set of ssn
    values in Manages must be subset of supervisor_ ssn values
    in Reports_ To.) Foreign key? Expressible in ER model?
    General constraints:
    – e. g., Manager’s discretionary budget less than 10% of the
    combined budget of all departments he or she manages .
  • Regular Entities : Each regular entity type maps into a base relation
    The database will thus contain 5 base relations : DEPT, EMP, Supplier, Part and Project; the primary keys for these relations being : DEPT#, EMP#, S#, P# and J#
    Weak Entities :
    The relationship from a weak entity type to the entity type on which it depends is of course a many-to-one relationship.
    However the foreign key rules for that relationship be as follows :
    DELETE CASCADES
    UPDATE CASCADES
  • An Entity Type Department has attributes Name , Number, Location , Manager and Manager Start date. Location Is a Multi Valued attribute. Name and Number are key attributes since each was specified to be Unique.
    An Entity Type Project with attributes Name, Number , Locaiton and Controlling Department. Both Name and Number are key attributes.
    Employee Entity with attributes name , SSN ( Social Security Number ) , Gender, Birth Date , Salary , Supervisor. Both name and address are composite in nature.
    Dependent Type is an Weaker Entity , SSN, Name of Dependant , Gender , Date of Birth , Relationship ( To the Employee).
    NOTE : The Design is called Chen Design for Identifying Entities before implenting ER Diagram.
  • Number of People Work in the location can be a derived type.
  • Summary of Conceptual Design
    Conceptual design follows requirements analysis,
    – Yields a high- level description of data to be stored
    ER model popular for conceptual design
    – Constructs are expressive, close to the way people think about their applications.
    Basic constructs: entities, relationships, and attributes (of entities and relationships).
    Some additional constructs: weak entities, ISA hierarchies, and aggregation .
    Note: There are many variations on ER model.
    Summary of ER
    Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/ covering constraints for ISA hierarchies. Some foreign key constraints are also implicit in the definition of a relationship set.
    – Some of these constraints can be expressed in SQL only if we use general CHECK constraints or assertions.
    – Some constraints (notably, functional dependencies ) cannot be expressed in the ER model.
    – Constraints play an important role in determining the best database design for an enterprise.
    ER design is subjective . There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include:
    Entity vs. attribute, entity vs. relationship, binary or n- ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation.
    Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful.
  • Case Studies:
    1. Prescriptions-R-X chain
    The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of medicines if you design its database. Given the rising cost of health care, you agree. Here's the information that you gather:
    Patients are identifed by an SSN, and their names, addresses, and ages must be recorded.
    Doctors are identifed by an SSN. For each doctor, the name, specialty, and years of experience must be recorded.
    Each pharmaceutical company is identified by name and has a phone number.
    For each drug, the trade name and formula must be recorded. Each drug is sold by a given pharmaceutical company, and the trade name identifes a drug uniquely from among the products of that company. If a pharmaceutical company is deleted, you need not keep track of its products any longer.
    Each pharmacy has a name, address, and phone number.
    Every patient has a primary physician. Every doctor has at least one patient.
    Each pharmacy sells several drugs and has a price for each. A drug could be sold at several pharmacies, and the price could vary from one pharmacy to another.
    Doctors prescribe drugs for patients. A doctor could prescribe one or more drugs for several patients, and a patient could obtain prescriptions from several doctors. Each prescription has a date and a quantity associated with it. You can assume that if a doctor prescribes the same drug for the same patient more than once, only the last such prescription needs to be stored.
    Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical company can contract with several pharmacies, and a pharmacy can contract with several pharmaceutical companies. For each contract, you have to store a start date, an end date, and the text of the contract.
    Pharmacies appoint a supervisor for each contract. There must always be a supervisor for each contract, but the contract supervisor can change over the lifetime of the contract.
    1. Draw an ER diagram that captures the above information. Identify any constraints that are not captured by the ER diagram.
    2. How would your design change if each drug must be sold at a fixed price by all pharmacies?
    3. How would your design change if the design requirements change as follows: If a doctor prescribes the same drug for the same patient more than once, several such prescriptions may have to be stored.
  • 2. Dane County Airport
    Computer Sciences Department frequent have been complaining to Dane County Airport officials about the poor organization at the airport. As a result, the officials have decided that all information related to the airport should be organized using a DBMS, and you've been hired to design the database. Your first task is to organize the information about all the airplanes that are stationed and maintained at the airport.
    The relevant information is as follows:
    Every airplane has a registration number, and each airplane is of a specic model.
    The airport accommodates a number of airplane models, and each model is identified by a model number (e.g., DC-10) and has a capacity and a weight.
    A number of technicians work at the airport. You need to store the name, SSN, address, phone number, and salary of each technician.
    Each technician is an expert on one or more plane model(s), and his or her experitise may overlap with that of other technicians. This information about technicians must also be recorded.
    Traffic controllers must have an annual medical examination. For each Traffic controller, you must store the date of the most recent exam.
    All airport employees (including technicians) belong to a union. You must store the union membership number of each employee. You can assume that each employee is uniquely identified by the social security number.
    The airport has a number of tests that are used periodically to ensure that air-planes are still airworthy. Each test has a Federal Aviation Administration (FAA) test number, a name, and a maximum possible score.
    The FAA requires the airport to keep track of each time that a given airplane is tested by a given technician using a given test. For each testing event, the information needed is the date, the number of hours the technician spent doing the test, and the score that the airplane received on the test.
    1. Draw an ER diagram for the airport database. Be sure to indicate the various attributes of each entity and relationship set; also specify the key and participation constraints for each relationship set. Specify any necessary overlap and covering constraints as well (in English).
    2. The FAA passes a regulation that tests on a plane must be conducted by a technician who is an expert on that model. How would you express this constraint in the ER diagram? If you cannot express it, explain briefly.
  • 3. University Database:
    Consider the following information about a university database:
    Professors have an SSN, a name, an age, a rank, and a research specialty.
    Projects have a project number, a sponsor name (e.g., NSF), a starting date, an ending date, and a budget.
    Graduate students have an SSN, a name, an age, and a degree program (e.g., M.S. or Ph.D.).
    Each project is managed by one professor (known as the project's principal investigator).
    Each project is worked on by one or more professors (known as the project's co-investigators).
    Professors can manage and/or work on multiple projects.
    Each project is worked on by one or more graduate students (known as the project's research assistants).
    When graduate students work on a project, a professor must supervise their work on the project. Graduate students can work on multiple projects, in which case they will have a (potentially different) supervisor for each one.
    Departments have a department number, a department name, and a main office. Departments have a professor (known as the chairman) who runs the department.
    Professors work in one or more departments, and for each department that they work in, a time percentage is associated with their job.
    Graduate students have one major department in which they are working on their degree.
    Each graduate student has another, more senior graduate student (known as a student advisor) who advises him or her on what courses to take.
    Design and draw an ER diagram that captures the information about the university.
    Use only the basic ER model here, that is, entities, relationships, and attributes. Be sure to indicate any key and participation constraints.
  • Topics Covered :
    Normalization and Normal Forms
    Why Normal Forms
    The Evils Of Redundancy
    Refining an ER Diagram
    First Normal Form
    Functional Dependencies
    Example: Constraints On Entity Set
    Second Normal Form
    Transitive Dependency
    Third Normal Form
    Boyce Codd Normal Form (BCNF)
    Decomposition of a Relation Scheme
    Lossless Join Decompositions
    Summary and Examples
  • Normalization is a step-by-step decomposition of complex records into simple records. It results in the formation of tables that satisfy certain specified constraints, and represent certain normal forms. Normalization reduces redundancy using the principle of non-loss decomposition. A fully normalized record consists of
    - A primary key that identifies an entity
    -A set of attributes that describe the entity
    Several normal forms have been identified, the most important and widely used of which are
    first normal form
    second normal form
    third normal form and
    Boyce-Codd normal form.
  • In order to produce good database design, we should ask questions like:
    1) Does this design ensure that all database operations will be efficiently performed and that the design does not make the DBMS perform expensive consistency checks which could be avoided?
    2) Is the information unnecessarily replicated?
    Unless these issues are properly handled several difficulties like redundancy and loss of information may arise. There are several methods to avoid the above mentioned problems. One such method is database decomposition through normalization, which tries to minimize redundancy and the efforts of checking of constraints and dependencies.
  • Redundancy problems associated with relational schemas:
    – redundant storage, insert/ delete/ update anomalies
    Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.
    Decomposition should be used judiciously:
    – Is there reason to decompose a relation?
    – What problems (if any) does the decomposition cause?
  • Consider the above ER diagram, with the Works_in relation having a Key constraint indicating that an employee can work in at most one department.
    ER diagram can be translated into two relations:
    Worker (ssn, name, lot, since, did)
    Department (did, dname, budget)
    – Lots associated with workers.
    Suppose all workers in a dept are assigned the same lot: D  L ie. did functionally determines lot.This leads to redundancy.
  • The redundancy in earlier slide can be fixed by breaking the relation Worker as:
    Workers (ssn, name, since, did)
    Dept_ Lots( did, lot)
    Can fine- tune this:
    Workers (ssn, name, since, did)
    Department (did, dname, budget, lot)
  • EMP_PROJ = {eno, ename, {pnumber, hours}}  mutivalued
    eno is the primary key
    Above relation not in 1NF
    Pnumber is the partial primary key of each nested relation.
    Within each tuple, the nested relation must have unique values of pnumber
    Break EMP_PROJ as:
    EMP_PROJ1(eno, ename)
    EMP_PROJ2(eno, pnumber, hours)
  • Given a relation R, attribute A is functionally dependent on B if each A in R is associated with precisely one value of B.
    We say B functionally determines A and represent it as B  A
    This means that there can be no two tuples which have the same value of attribute A and different values in attribute B.
    An FD is a statement about all allowable relations.
    – Must be identified based on semantics of application.
    – Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R!
    K is a candidate key for R means that K  R
    – However, K  R does not require K to be minimal!
    Role of FDs in detecting redundancy:
    – Consider a relation R with 3 attributes, ABC.
    No FDs hold: There is no redundancy here.
    Given A  B: Several tuples could have the same A value, and if so, they’ll all have the same B value!
    Reasoning About FDs
    Given some FDs, we can usually infer additional FDs:
    ssn  did, did  lot implies ssn  lot
  • Full Dependency:
    An attribute B of a relation R is fully functional dependent on attribute A of R if it is functionally dependent on A & not functionally dependent on any proper subset of A.
    {Eno, Pnumber}  HOURS
    Full functional dependency:
    Eno hours and Pnumber Hours DOESN’T HOLD
  • {Eno, Pnumber}  Ename
    Partial dependency:
    Eno  Ename holds.
  • Consider relation obtained from Hourly_ Emps:
    – Hourly_ Emps ( ssn, name, lot, rating, hrly_ wages, hrs_ worked )
    Notation : We will denote this relation schema by listing the attributes: SNLRWH
    – This is really the set of attributes {S, N, L, R, W, H}.
    – Sometimes, we will refer to all attributes of a relation by using the relation name. (e. g., Hourly_ Emps for SNLRWH)
    Some FDs on Hourly_ Emps:
    – ssn is the key: S  SNLRWH
    – rating determines hrly_ wages : R  W
    Problems due to R  W :
    – Update anomaly : Can we change W in just the 1st tuple of SNLRWH?
    – Insertion anomaly : What if we want to insert an employee and don’t know the hourly wage for his rating?
    – Deletion anomaly : If we delete all employees with rating 5, we lose the information about the wage for rating 5!
  • General Definition of 2NF :
    A table is said to be in 2NF when it is in 1NF and every non-prime attribute in the record is functionally dependent upon the whole key, and not just part of the key.
    The steps for converting a database to 2NF are:
    Find and remove attributes that are related to only a part of the key
    Group the removed items in another table
    Assign the new table a key that consists of that part of the old composite key
    If a relation is not in 2NF, it can be further normalized into a number of 2NF relations.
    EP1
    Eno, Pnumber, Hours
    EP2
    Eno, Ename
    EP3
    Pnumber, Pname, Plocation
    EP1, EP2 AND EP3 satisfy 2NF.
  • The data stored in the table
    Emp{Eno, Dept, ProjCode, Hours}
    is in 1NF. The Primary key here is composite: {Eno, ProjCode}
    The attributes of this table depend upon only part of the Primary key:
    Eno + ProjCode functionally determines Hours.
    Eno functionally determines Dept. Attribute Dept has no dependency on ProjCode.
    The situation could lead to the following problems:
    Insertion: The record of employee cannot be entered until the employee is assigned a project.
    Updation: For a given employee, the employee code and department is repeated several times. Hence, if an employee is transferred to another department, this change will have to be recorded in every instance or record of the employee. Any omissions will lead to inconsistencies.
    Deletion: If an employee completes work on a project, the employee’s record will be deleted. The information regarding the department the employee belongs to will also be lost.
    This table should therefore be decomposed without any loss of information as:
    Emp {Eno, Dept}
    Proj {Eno, ProjCode, Hours}
  • EMP_DEPT
    Ename, Eno, Bdate, Addr, Dnumber, Dname, DMgrNo
    Eno  DMgrNo is a transitive dependency.
    Dependency of DMgrNo on key attribute Eno is transitive via Dnumber because Eno  Dnumber and Dnumber  DMgrNo hold well.
    Dnumber is not a subset of the key of EMP_DEPT.
  • General Definition of 3NF :
    A relation schema R is in 3NF if whenever a functional dependency X  A hold in R, then either (a) X is a superkey of R or (b) A is a prime attribute of R
    R is in 3NF if every nonprime attribute of R is
    (a) fully functionally dependent on every key of R and
    (b) non-transitively depedent on every key of R.
    If 3NF is violated by X  A, one of the following holds:
    X is a subset of some key K
    We store (X, A) pairs redundantly.
    X is not a proper subset of any key.
    There is a chain of FDs K  X  A, which means that we cannot associate an X value with a K value unless we also associate an A value with an X value.
  • Consider the table Emp:
    Emp{Eno, Dept, Dept_Head}
    The primary key here is Eno. The attribute dept is dependent on Eno. The attribute Dept_Head is dependent on Dept.
    Notice that there is an indirect dependence on the primary key.
    Emp is in 2NF but not in 3NF because of transitive dependency of Dept_Head on Eno via Dept;.
    The problems with dependency of this kind are:
    Insertion: The department head of a new department that does not have any employees as yet cannot be entered.
    Updation: For a given department, the particular head’s code is repeated several times. Hence, if a department head moves to another department, the changes will have to be made consistently across the table.
    Deletion: If a particular employee’s record is deleted, the information regarding the head of the department will be a loss of information.
    The relation is therefore decomposed to the following two relations:
    Emp{Eno, Dept}
    Dept{Dept, Dept_Head}
    Emp and Dept are in 3NF. Natural join of Emp and Dept will recover original EMP table.

Transcript

  • 1. raghu@theoracletrainer.com www.theoracletrainer.com
  • 2. Introduction to Database Management Systems (DBMS)
  • 3. Database Management System (DBMS) Definitions:   Data: Known facts that can be recorded and that have implicit meaning Database: Collection of related data   Ex. the names, telephone numbers and addresses of all the people you know Database Management System: A computerized record-keeping system raghu@theoracletrainer.com www.theoracletrainer.com
  • 4. DBMS (Contd.)  Goals of a Database Management System:    To provide an efficient as well as convenient environment for accessing data in a database Enforce information security: database security, concurrence control, crash recovery It is a general purpose facility for:  Defining database  Constructing database  Manipulating database raghu@theoracletrainer.com www.theoracletrainer.com
  • 5. Benefits of database approach        Redundancy can be reduced Inconsistency can be avoided Data can be shared Standards can be enforced Security restrictions can be applied Integrity can be maintained Data independence can be provided raghu@theoracletrainer.com www.theoracletrainer.com
  • 6. DBMS Functions       Data Definition Data Manipulation Data Security and Integrity Data Recovery and Concurrency Data Dictionary Performance raghu@theoracletrainer.com www.theoracletrainer.com
  • 7. Database System Users DATABASE Application Programs/Queries SYSTEM DBMS Software Software to process queries/programs Software to access stored data Stored Data Defn. (META-DATA). raghu@theoracletrainer.com Stored Database www.theoracletrainer.com
  • 8. Data Model   A set of concepts used to desscribe the structure of a database By structure, we mean the data types, relationships, and constraints that should holds for the data Categories of Data Models Conceptual raghu@theoracletrainer.com Physical Representational www.theoracletrainer.com
  • 9. Database Architecture External level (individual user views) Conceptual level (community user view) Internal level (storage view) Database raghu@theoracletrainer.com www.theoracletrainer.com
  • 10. An example of the three levels SNo FName LName Age Salary Conceptual View SNo FName LName Age External View1 SNo LName BranchNo External View2 raghu@theoracletrainer.com Salary BranchNo struct STAFF { Internal int staffNo; View int branchNo; char fName[15]; char lName[15]; struct date dateOfBirth; float salary; struct STAFF *next; /* pointer to next Staff record */ }; index staffNo; index branchNo; /* define indexes for staff */ www.theoracletrainer.com
  • 11. Schema   Schema: Description of data in terms of a data model Three-level DB Architecture defines following schemas:  External Schema (or sub-schema)   Conceptual Schema (or schema)   Written using external DDL Written using conceptual DDL Internal Schema  Written using internal DDL or storage structure definition raghu@theoracletrainer.com www.theoracletrainer.com
  • 12. Data Independence  Change the schema at one level of a database system without a need to change the schema at the next higher level  Logical data independence: Refers to the immunity of the external schemas to changes in the conceptual schema e.g., add new record or field Physical data independence: Refers to the immunity of the conceptual schema to changes in the internal schema e.g., adding new index should not void existing ones www.theoracletrainer.com raghu@theoracletrainer.com 
  • 13. TYPES OF DATABASE MODELS HIERARCHICAL NETWORK COLUMN ROW VALUE TABLE RELATIONAL raghu@theoracletrainer.com www.theoracletrainer.com
  • 14. DATABASE DESIGN PHASES DATA ANALYSIS Entities - Attributes - Relationships - Integrity Rules LOGICAL DESIGN Tables - Columns - Primary Keys - Foreign Keys PHYSICAL DESIGN DDL for Tablespaces, Tables, Indexes raghu@theoracletrainer.com www.theoracletrainer.com
  • 15. Introduction to Relational Databases: RDBMS
  • 16. Some Important Terms  Relation : a table  Tuple : a row in a table  Attribute : a Column in a table  Degree : number of attributes  Cardinality : number of tuples  Primary Key : a unique identifier for the table  Domain : a pool of values from which specific attributes of specific relations draw their values raghu@theoracletrainer.com www.theoracletrainer.com
  • 17. Keys  Key  Super Key  Candidate Keys    Primary Key Alternate Key Secondary Keys raghu@theoracletrainer.com www.theoracletrainer.com
  • 18. Keys and Referential Integrity Enrolled sid 53666 53688 53650 53666 cid grade carnatic101 C reggae203 B topology112 A history105 B Foreign key referring to sid of STUDENT relation raghu@theoracletrainer.com Student sid name login age gpa 53666 Jones Jones@cs 18 3.4 53688 Smith Smith@eecs 18 3.2 53650 Smith Smith@math 19 3.8 Primary key www.theoracletrainer.com
  • 19. raghu@theoracletrainer.com www.theoracletrainer.com
  • 20. Conceptual Design Using the Entity- Relationship Model
  • 21. Overview of Database Design  Conceptual design : (ER Model is used at this stage.)  Schema Refinement : (Normalization)  Physical Database Design and Tuning raghu@theoracletrainer.com www.theoracletrainer.com
  • 22. Design Phases… Requirements Collection & Analysis Data Requirements Functional Requirements User Defined Operations Data Flow Diagrams Sequence Diagrams, Scenarios Conceptual Design Entity Types, Constraints , Relationships No Implementation Details. Logical Design Ensures Requirements Meets the Design Data Model Mapping – Type of Database is identified Physical Design Internal Storage Structures / Access Path / File Organizations raghu@theoracletrainer.com www.theoracletrainer.com
  • 23. E-R Modeling  Entity   Entity Set   a group of similar entities Attribute   is anything that exists and is distinguishable properties that describe an entity Relationship  an association between entities raghu@theoracletrainer.com www.theoracletrainer.com
  • 24. Notations ENTITY TYPE ( REGULAR ) WEAK ENTITY TYPE RELATIONSHIP TYPE WEAK RELATIONSHIP TYPE raghu@theoracletrainer.com www.theoracletrainer.com
  • 25. Entity Attributes ssn name Employee lot SSN NAME 123- 22- 3666 Attishoo 231- 31- 5368 Smiley 131- 24- 3650 Smethurst LOT 48 22 35 Entity Set CREATE TABLE Employees (ssn CHAR (11), name CHAR (20), lot INTEGER, PRIMARY KEY (ssn)) raghu@theoracletrainer.com www.theoracletrainer.com
  • 26. Types of Relationships 1 1:1 student 1:M students M M:M students M raghu@theoracletrainer.com Is issued enrols in take 1 ID card 1 course M tests www.theoracletrainer.com
  • 27. ER Model ssn lot name Employee supervisor since Works_in did dname budget Department Subordinate Reports_To raghu@theoracletrainer.com www.theoracletrainer.com
  • 28. ER Model (Contd.) Works_ In SSN 123-22-3666 123-22-3666 231-31-5368 DID 51 56 51 raghu@theoracletrainer.com SINCE 1/1/91 3/3/93 2/2/92 CREATE TABLE Works_ In( ssn CHAR (11), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments) www.theoracletrainer.com
  • 29. Key Constraints ssn name Employee raghu@theoracletrainer.com lot since Manages did dname budget Department www.theoracletrainer.com
  • 30. Key Constraints for Ternary Relationships ssn lot name Employee since Works_in did dname budget Department Location address raghu@theoracletrainer.com capacity www.theoracletrainer.com
  • 31. Participation Constraints ssn name Employee lot since Manages did dname budget Department Works_in since raghu@theoracletrainer.com www.theoracletrainer.com
  • 32. Weak Entities ssn name Employee raghu@theoracletrainer.com lot cost policy pname age Dependent www.theoracletrainer.com
  • 33. ISA (‘is a’) Hierarchies ssn name lot Employee Hrly_wages Hrs_worked Hourly_Emp raghu@theoracletrainer.com IsA contractid Contract_Emp www.theoracletrainer.com
  • 34. Aggregation ssn name lot Employee monitors pid pbudget project raghu@theoracletrainer.com Started on sponsors until did dname budget department www.theoracletrainer.com
  • 35. Entity vs. Attribute Works_ In does not allow an employee to work in a department for two or more periods (why?) ssn name lot Employee raghu@theoracletrainer.com from to Works_in did dname budget Department www.theoracletrainer.com
  • 36. Entity vs. Attribute (Contd.) ssn lot name Employee from raghu@theoracletrainer.com did Works_in Duration dname budget Department to www.theoracletrainer.com
  • 37. Entity vs. Relationship ssn name lot Employee since DB manages did dname budget Department DB - Dbudget raghu@theoracletrainer.com www.theoracletrainer.com
  • 38. Entity vs. Relationship ssn name lot Employee did manages dname budget Department since Appt num Mgr_appt DBudget raghu@theoracletrainer.com www.theoracletrainer.com
  • 39. Binary vs. Ternary Relationships ssn lot name Employee pname age Dependent covers Policy policyid raghu@theoracletrainer.com cost www.theoracletrainer.com
  • 40. Binary vs. Ternary Relationships Better Design ssn name lot pname Dependent Employee Beneficiary purchaser policyid raghu@theoracletrainer.com age Policy cost www.theoracletrainer.com
  • 41. Constraints Beyond the ER Model • Some constraints cannot be captured in ER diagrams: • Functional dependencies • Inclusion dependencies • General constraints raghu@theoracletrainer.com www.theoracletrainer.com
  • 42. E-R Diagram DEPARTMENT 1 SUPPLIER DEPT_ EMP M M M PROJ_ WORK M PROJECT EMPLOYEE 1 M M 1 PROJ_ MGR M DEPENDENT raghu@theoracletrainer.com SUPP_ PART M EMP_ DEP M SUPP_ PART_ PROJ PART M M M PART_ STRUC TURE www.theoracletrainer.com
  • 43. Example to Start with ….  An Example Database Application called COMPANY which serves to illustrate the ER Model concepts and their schema design. The following are collection from the Client. raghu@theoracletrainer.com www.theoracletrainer.com
  • 44. Analysis…  Company : Organized into Departments, Each Department has a name, no and manager who manages the department. The Company keeps track of the date that employee managing the department. A Department may have a Several locations. raghu@theoracletrainer.com www.theoracletrainer.com
  • 45. Analysis…   Department : A Department controls a number of Projects each of which has a unique name , no and a single Location. Employee : Name, Age, Gender, BirthDate, SSN, Address, Salary. An Employee is assigned to one department, may work on several projects which are not controlled by the department. Track of the number of hours per week is also controlled. raghu@theoracletrainer.com www.theoracletrainer.com
  • 46. Analysis….  Keep track of the dependents of each employee for insurance policies : We keep each dependant first name, gender, Date of birth and relationship to the employee. raghu@theoracletrainer.com www.theoracletrainer.com
  • 47. DEPARTMENT ( Name , Number , { Locations } , Manager, Start Date ) PROJECT ( Name, Number, Location , Controlling Department ) EMPLOYEE (Name (Fname, Lname) , SSN , Gender, Address, Salary Birthdate, Department , Supervisor , (Workson ( Project , Hrs)) DEPENDENT ( Employee, Name, Gender, Birthdate , Relationship ) raghu@theoracletrainer.com www.theoracletrainer.com
  • 48. Example …  Manage:     Department and Employee Partial Participation Relation Attribute : StartDate. Works For:   Department and Employee Total Participation raghu@theoracletrainer.com www.theoracletrainer.com
  • 49. Example…  Control : Department , Project  Partial Participation from Department  Total Participation from Project  Control Department is a RKA.   Supervisor : Employee, Employee  Partial and Recursive  raghu@theoracletrainer.com www.theoracletrainer.com
  • 50. Example …  Works – On : Project , Employee  Total Participation  Hours Worked is a RKA.   Dependants of: Employee , Dependant  Dependant is a Weaker  Dependant is Total , Employee is Partial.  raghu@theoracletrainer.com www.theoracletrainer.com
  • 51. One Possible mapping of the Problem Statement Name No Lname Fname Work s For Sal Sex Loc Department SSN Name Employee Sdate Address Control s manage s Bdate Hours Project Work sOn Supe rvise s Name No Depend On Dependent raghu@theoracletrainer.com Name Sex Bdate Relationship www.theoracletrainer.com Loc
  • 52. raghu@theoracletrainer.com www.theoracletrainer.com
  • 53. raghu@theoracletrainer.com www.theoracletrainer.com
  • 54. raghu@theoracletrainer.com www.theoracletrainer.com
  • 55. raghu@theoracletrainer.com www.theoracletrainer.com
  • 56. Schema Refinement and Normalization
  • 57. Normalization and Normal Forms  Normalization: Decomposing a larger, complex table into several smaller, simpler ones.  Move from a lower normal form to a higher Normal form.   Normal Forms: First Normal Form (1NF)  Second Normal Form (2NF)  Third Normal Form (3NF)  *Higher Normal Forms (BCNF, 4NF, 5NF ....)   In practice, 3NF is often good enough. www.theoracletrainer.com raghu@theoracletrainer.com
  • 58. Why Normal Forms  The first question to ask is whether any refinement is needed!  If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/ minimized. This can be used to help us decide whether decomposing the relation will help. raghu@theoracletrainer.com www.theoracletrainer.com
  • 59. The Evils of Redundancy     Redundancy is at the root of several problems associated with relational schemas More seriously, data redundancy causes several anomalies: insert, update, delete Wastage of storage. Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD). raghu@theoracletrainer.com www.theoracletrainer.com
  • 60. Refining an ER Diagram - Before ssn name lot Employee raghu@theoracletrainer.com since Works_in did dname budget Department www.theoracletrainer.com
  • 61. Refining an ER Diagram - After ssn name since did dname budget lot Employee raghu@theoracletrainer.com Works_in Department www.theoracletrainer.com
  • 62. First Normal Form  A table is in 1NF, if every row contains exactly one value for each attribute.  Disallow multivalued attributes, composite attributes and their combinations.  1NF states that :   domains of attributes must include only atomic (simple, indivisible) values and that value of any attribute in a tuple must be a single value from the domain of that attribute. By definition, any relational table must be in 1NF. raghu@theoracletrainer.com www.theoracletrainer.com
  • 63. Functional Dependencies (FDs)  Provide a formal mechanism to express constraints between attributes  Given a relation R, attribute Y of R is functionally dependent on the attribute X of R if & only if each X-value in R has associated with it precisely one Y-value in R. raghu@theoracletrainer.com www.theoracletrainer.com
  • 64. Full Dependency  Concept of full functional dependency  A FD x → y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. raghu@theoracletrainer.com www.theoracletrainer.com
  • 65. Partial Dependency  An F.D. x → y is a partial dependency if there is some attribute A ∈ X that can be removed from X and the dependency will still hold. raghu@theoracletrainer.com www.theoracletrainer.com
  • 66. Example: Constraints on Entity Set S N 123- 22- 3666 Attishoo 231- 31- 5368 Smiley 131- 24- 3650 Smethurst 434- 26- 3751 Guldu 612- 67- 4134 Madayan S N 123- 22- 3666 Attishoo 231- 31- 5368 Smiley 131- 24- 3650 Smethurst 434- 26- 3751 Guldu 612- 67- 4134 Madayan raghu@theoracletrainer.com L 48 22 35 35 35 H 40 30 30 32 40 L 48 22 35 35 35 R 8 8 5 5 8 R 8 8 5 5 8 W 10 10 7 7 10 H 40 30 30 32 40 R W 5 7 8 10 www.theoracletrainer.com
  • 67. Second Normal Form (2NF)  A relation schema R is in 2NF if:  it is in 1NF and  every non-prime attribute A in R is fully functionally dependent on the primary key of R.  2NF prohibits partial dependencies. raghu@theoracletrainer.com www.theoracletrainer.com
  • 68. 2NF: An Example  Emp{Eno, Dept, ProjCode, Hours}    Primary key: {Eno, ProjCode} {Eno} -> {Dept}, {Eno, ProjCode} -> {Hours} Test of 2NF    {Eno} -> {Dept}: partial dependency. Emp is in 1NF, but not in 2NF. Decomposition:  Emp {Eno, Dept}  Proj {Eno, ProjCode, raghu@theoracletrainer.com Hours} www.theoracletrainer.com
  • 69. Transitive Dependency  An FD X → Y in a relation schema R is a transitive dependency if  there is a set of attributes Z that is not a subset of any key of R, and  both X → Z and Z → Y hold. raghu@theoracletrainer.com www.theoracletrainer.com
  • 70. Third Normal Form  A relation schema R is in 3NF if  It is in 2NF and  No nonprime attribute of R is transitively dependent on the primary key.  3NF means that each non-key attribute value in any tuple is truly dependent on the Primary Key and not even partially on other attributes.  3NF prohibits transitive dependencies. raghu@theoracletrainer.com www.theoracletrainer.com
  • 71. 3NF: An Example  Emp{Eno, Dept, Dept_Head} Primary key: {Eno}  {Eno} -> {Dept}, {Dept} -> {Dept_Head}   Test of 3NF {Eno} -> {Dept} -> {Dept_Head}: Transitive dependency.  Emp is in 2NF, but not in 3NF.   Decomposition: Emp {Eno, Dept}  Dept {Dept, Dept_Head}  raghu@theoracletrainer.com www.theoracletrainer.com
  • 72. Boyce –Codd Normal Form  The intention of BCNF is that- 3NF does not satisfactorily handle the case of a relation processing two or more composite or overlapping candidate keys raghu@theoracletrainer.com www.theoracletrainer.com
  • 73. BCNF ( Boyce Codd Normal Form)  A Relation is said to be in Boyce Codd Normal Form (BCNF) if and only if every determinant is a candidate key. raghu@theoracletrainer.com www.theoracletrainer.com