Chapter 9 Database Management Systems
Objectives for Chapter 9 Operational problems inherent in the flat file approach to data management that gave rise to the database concept The relationships among the defining elements of the database environment The anomalies caused by unnormalized databases and the need for data normalization The stages in database design including entity identification, data modeling, constructing the physical database, and preparing user views The operational features of distributed databases and recognize the issues that need to be considered in deciding on a particular database configuration
Flat-File Versus Database Environments Computer processing involves two components: data and instructions (programs).  Conceptually, there are two methods for designing the interface between program instructions and data:  file-oriented processing:  A specific data file was created for each application  data-oriented processing:  Create a single data repository to support numerous applications. Disadvantages of file-oriented processing include redundant data and programs and varying formats for storing the redundant data. The format for similar fields may vary because the programmer used inconsistent field formats.
Flat-File Environment Program 1 Program 2 Program 3 A,B,C X,B,Y L,B,M User 2 Transactions User 1 Transactions User 3 Transactions Data
Data Redundancy &  Flat-File Problems Data Storage  - creates excessive storage costs of paper documents and/or magnetic form Data Updating  - any changes or additions must be performed multiple times Currency of Information  - potential problem of failing to update all affected files Task-Data Dependency  - user’s inability to obtain additional information as his or her needs change
Program 1 Program 2 Program 3 User 2 Transactions User 1 Transactions User 3 Transactions Database D B M S A, B, C, X, Y, L, M Database Approach
Advantages of the Database Approach Data sharing/centralize database resolves flat-file problems: No   data redundancy -  Data is stored only once, eliminating data redundancy and reducing storage costs. Single update -  Because data is in only one place, it requires only a single update procedure, reducing the time and cost of keeping the database current. Current values -  A change to the database made by any user yields current data values for all other users. Task-data independence -  As users’ information needs expand beyond their immediate domain, the new needs can be more easily satisfied than under the flat-file approach.
Disadvantages of the Database Approach Can be costly to implement additional hardware, software, storage, and network resources are required Can only run in certain operating environments  may make it unsuitable for some system configurations Because it is so different from  the file-oriented approach, the database  approach requires training users may be inertia or resistance
Elements of the Database Approach System Development Process Database  Administrator U S E R S DBMS Host Operating System Physical Database User Programs User Programs User Programs Applications Data Definition Language Data Manipulation Language Query Language User Queries Transactions Transactions Transactions System Requests
DBMS Features User Programs  - makes the presence of the DBMS transparent to the user Direct Query  - allows authorized users to access data without programming Application Development -  user created applications Backup and Recovery  - copies database Database Usage Reporting  - captures statistics on database usage (who, when, etc.) Database Access  - authorizes access to sections of the database
Internal Controls and DBMS The purpose of the DBMS is to  provide controlled access  to the database. The DBMS is a special software system programmed to know which data elements each user is authorized to access and  deny unauthorized requests of data .
Data Definition Language (DDL) DDL is  a programming language used to define the database  to the DBMS.  The DDL identifies the names and the relationship of all data elements, records, and files that constitute the database. Viewing Levels: internal view  - physical arrangement of records ( 1 ) conceptual view  - representation of database ( 1 ) user view  - the portion of the database each user views ( many )
Data Manipulation Language (DML) DML is the proprietary programming language that a particular DBMS uses to retrieve, process, and store data. Entire user programs may be written in the DML, or selected DML commands can be inserted into universal programs, such as COBOL and FORTRAN.
Query Language The query capability permits end users and professional programmers to access data in the database without the need for conventional programs. ANSI’s Structured Query Language (SQL)   is a fourth-generation language that has emerged as the standard query language.
Functions of the DBA
Logical Data Structures A particular method used to organize records in a database is called the database’s structure. The objective is to develop this structure efficiently so that data can be accessed quickly and easily. Four types of structures are:  hierarchical (aka the tree structure) network relational object-oriented
The Relational Model The relational model portrays data in the form of two dimensional tables: relation - the database table attributes (data elements) - form columns tuples (records) - form rows data - the intersection of rows and columns
PROJECT - filtering out columns, such as the green Z3 JOIN RESTRICT - filtering out rows,  such as the yellow X1 X1 X2 X2 X3 X3 Y1 Y1 Y1 Y1 Y1 Y2 Y2 Y2 Y3 Z1 Z1 Z2 Z2 Z1
Properly Designed Relational Tables No repeating values - All occurrences at the intersection of a row and column are a single value. The attribute values in any column must all be of the same class. Each column in a  given  table must be uniquely named.  Each row in the table must be unique in at least one attribute, which is the primary key.
Crow’s Feet Cardinalities (1:0,1) (1:1) (1:0,M) (1:M) (M:M)
Relational Model Data Linkages (>1 table) No explicit pointers  are present.  The data are viewed as a collection of independent tables.  Relations are formed by an  attribute that is common  to both tables in the relation. Assignment of foreign keys: if 1 to 1 association, either of the table’s primary key may be the foreign key. if 1 to many association, the primary key on one of the sides is embedded as the foreign key on the other side. if many to many association, may embed foreign keys or create a separate linking table.
Three Types of Anomalies Insertion Anomaly :  A new item cannot be added to the table until at least one entity uses a particular attribute item. Deletion Anomaly : If an attribute item used by only one entity is deleted, all information about that attribute item is lost. Update Anomaly : A modification on an attribute must be made in each of the rows in which the attribute appears. Anomalies can be corrected by creating  relational tables .
Advantages of Relational Tables Removes all three anomalies Various items of interest (customers, inventory, sales) are stored in separate tables. Space is used efficiently. Very flexible. Users can form ad hoc relationships.
The Normalization Process A process which  systematically splits unnormalized complex tables into smaller tables  that meet two conditions: all nonkey (secondary) attributes in the table are dependent on the primary key all nonkey attributes are independent of the other nonkey attributes When unnormalized tables are split and reduced to third normal form, they must then be linked together by foreign keys.
Steps in Normalization Table with repeating groups First normal form 1NF Second normal form 2NF Third normal form 3NF Higher normal forms Remove repeating groups Remove  partial dependencies Remove transitive dependencies Remove remaining anomalies
Accountants and Data Normalization The update anomaly can generate  conflicting and obsolete database values . The insertion anomaly can result in  unrecorded transactions and incomplete audit trails . The deletion anomaly can cause the loss of accounting records and the  destruction of audit trails. Accountants should have an understanding of the data normalization process and be able to determine whether a database is properly normalized.
Six Phases in Designing Relational Databases Identify entities identify the primary entities of the organization construct a data model of their relationships Construct a data model showing entity associations determine the associations between entities model associations into an ER diagram
Six Phases in Designing Relational Databases Add primary keys and attributes to the model  assign primary keys to all entities in the model to uniquely identify records every attribute should appear in one or more user views Normalize the data model and add foreign keys remove repeating groups, partial and transitive dependencies assign foreign keys to be able to link tables
Six Phases in Designing Relational Databases Construct the physical database create physical tables populate tables with data  Prepare the user views normalized tables should support all required views of system users user views restrict users from have access to unauthorized data
Distributed Data  Processing  Site C  Site B Site A  Centralized  Database Central  Site
Distributed Data Processing DP is organized around several information processing units (IPUs) distributed throughout the organization and placed under the control of the end users. DDP does not mean  decentralization IPUs are connected to  one another and coordinated
Potential Advantages of DDP Cost reductions  in hardware and data entry tasks Improved cost control  responsibility Improved user satisfaction  since control is closer to the user level Backup of data  can be improved through the use of multiple data storage sites
Potential Disadvantages of DDP Loss of control Mismanagement of organization-wide resources Hardware and software incompatibility Redundant tasks and data Consolidating incompatible tasks Difficulty attracting qualified personnel Lack of standards
The data is retained in a central location.  Remote IPUs send requests for data. Central site services the needs of the remote IPUs. The actual processing of the data is performed at the remote IPU. Centralized Databases in DDP Environment
Data Currency Occurs in DDP with a centralized database During transaction processing, the data will temporarily be inconsistent as a record is being read and updated.  Database lockout procedures are necessary to keep IPUs from reading inconsistent data and from writing over a transaction being written by another IPU.
Distributed Databases: Partitioning Splits the central database into segments that are distributed to their primary users Advantages: users’ control is increased by having data stored at local sites transaction processing response time is improved the volume of transmitted data between IPUs is reduced reduces the potential data loss from a disaster
The Deadlock Phenomenon Especially a problem with partitioned databases Occurs when multiple sites lock each other out of data that they are currently using  One site needs data locked by another site. Special software is needed to analyze and resolve conflicts.  Transactions may be terminated and have to be restarted.
The Deadlock Phenomenon A,B E, F C,D Locked A, waiting for C Locked C, waiting for E Locked E, waiting for A
Distributed Databases: Replication The duplication of the entire database for multiple IPUs This method is effective for situations with a high degree of data sharing, but no primary user, and supports read-only queries. The data traffic between sites is reduced considerably.
Concurrency Problems and Control Issues Database concurrency is the presence of complete and accurate data at all IPU sites.  With replicated databases, maintaining current data at all locations is a difficult task. Time stamping may be used to serialize transactions and to prevent and resolve any potential conflicts created by updating data at various IPUs.

Ch09

  • 1.
    Chapter 9 DatabaseManagement Systems
  • 2.
    Objectives for Chapter9 Operational problems inherent in the flat file approach to data management that gave rise to the database concept The relationships among the defining elements of the database environment The anomalies caused by unnormalized databases and the need for data normalization The stages in database design including entity identification, data modeling, constructing the physical database, and preparing user views The operational features of distributed databases and recognize the issues that need to be considered in deciding on a particular database configuration
  • 3.
    Flat-File Versus DatabaseEnvironments Computer processing involves two components: data and instructions (programs). Conceptually, there are two methods for designing the interface between program instructions and data: file-oriented processing: A specific data file was created for each application data-oriented processing: Create a single data repository to support numerous applications. Disadvantages of file-oriented processing include redundant data and programs and varying formats for storing the redundant data. The format for similar fields may vary because the programmer used inconsistent field formats.
  • 4.
    Flat-File Environment Program1 Program 2 Program 3 A,B,C X,B,Y L,B,M User 2 Transactions User 1 Transactions User 3 Transactions Data
  • 5.
    Data Redundancy & Flat-File Problems Data Storage - creates excessive storage costs of paper documents and/or magnetic form Data Updating - any changes or additions must be performed multiple times Currency of Information - potential problem of failing to update all affected files Task-Data Dependency - user’s inability to obtain additional information as his or her needs change
  • 6.
    Program 1 Program2 Program 3 User 2 Transactions User 1 Transactions User 3 Transactions Database D B M S A, B, C, X, Y, L, M Database Approach
  • 7.
    Advantages of theDatabase Approach Data sharing/centralize database resolves flat-file problems: No data redundancy - Data is stored only once, eliminating data redundancy and reducing storage costs. Single update - Because data is in only one place, it requires only a single update procedure, reducing the time and cost of keeping the database current. Current values - A change to the database made by any user yields current data values for all other users. Task-data independence - As users’ information needs expand beyond their immediate domain, the new needs can be more easily satisfied than under the flat-file approach.
  • 8.
    Disadvantages of theDatabase Approach Can be costly to implement additional hardware, software, storage, and network resources are required Can only run in certain operating environments may make it unsuitable for some system configurations Because it is so different from the file-oriented approach, the database approach requires training users may be inertia or resistance
  • 9.
    Elements of theDatabase Approach System Development Process Database Administrator U S E R S DBMS Host Operating System Physical Database User Programs User Programs User Programs Applications Data Definition Language Data Manipulation Language Query Language User Queries Transactions Transactions Transactions System Requests
  • 10.
    DBMS Features UserPrograms - makes the presence of the DBMS transparent to the user Direct Query - allows authorized users to access data without programming Application Development - user created applications Backup and Recovery - copies database Database Usage Reporting - captures statistics on database usage (who, when, etc.) Database Access - authorizes access to sections of the database
  • 11.
    Internal Controls andDBMS The purpose of the DBMS is to provide controlled access to the database. The DBMS is a special software system programmed to know which data elements each user is authorized to access and deny unauthorized requests of data .
  • 12.
    Data Definition Language(DDL) DDL is a programming language used to define the database to the DBMS. The DDL identifies the names and the relationship of all data elements, records, and files that constitute the database. Viewing Levels: internal view - physical arrangement of records ( 1 ) conceptual view - representation of database ( 1 ) user view - the portion of the database each user views ( many )
  • 13.
    Data Manipulation Language(DML) DML is the proprietary programming language that a particular DBMS uses to retrieve, process, and store data. Entire user programs may be written in the DML, or selected DML commands can be inserted into universal programs, such as COBOL and FORTRAN.
  • 14.
    Query Language Thequery capability permits end users and professional programmers to access data in the database without the need for conventional programs. ANSI’s Structured Query Language (SQL) is a fourth-generation language that has emerged as the standard query language.
  • 15.
  • 16.
    Logical Data StructuresA particular method used to organize records in a database is called the database’s structure. The objective is to develop this structure efficiently so that data can be accessed quickly and easily. Four types of structures are: hierarchical (aka the tree structure) network relational object-oriented
  • 17.
    The Relational ModelThe relational model portrays data in the form of two dimensional tables: relation - the database table attributes (data elements) - form columns tuples (records) - form rows data - the intersection of rows and columns
  • 18.
    PROJECT - filteringout columns, such as the green Z3 JOIN RESTRICT - filtering out rows, such as the yellow X1 X1 X2 X2 X3 X3 Y1 Y1 Y1 Y1 Y1 Y2 Y2 Y2 Y3 Z1 Z1 Z2 Z2 Z1
  • 19.
    Properly Designed RelationalTables No repeating values - All occurrences at the intersection of a row and column are a single value. The attribute values in any column must all be of the same class. Each column in a given table must be uniquely named. Each row in the table must be unique in at least one attribute, which is the primary key.
  • 20.
    Crow’s Feet Cardinalities(1:0,1) (1:1) (1:0,M) (1:M) (M:M)
  • 21.
    Relational Model DataLinkages (>1 table) No explicit pointers are present. The data are viewed as a collection of independent tables. Relations are formed by an attribute that is common to both tables in the relation. Assignment of foreign keys: if 1 to 1 association, either of the table’s primary key may be the foreign key. if 1 to many association, the primary key on one of the sides is embedded as the foreign key on the other side. if many to many association, may embed foreign keys or create a separate linking table.
  • 22.
    Three Types ofAnomalies Insertion Anomaly : A new item cannot be added to the table until at least one entity uses a particular attribute item. Deletion Anomaly : If an attribute item used by only one entity is deleted, all information about that attribute item is lost. Update Anomaly : A modification on an attribute must be made in each of the rows in which the attribute appears. Anomalies can be corrected by creating relational tables .
  • 23.
    Advantages of RelationalTables Removes all three anomalies Various items of interest (customers, inventory, sales) are stored in separate tables. Space is used efficiently. Very flexible. Users can form ad hoc relationships.
  • 24.
    The Normalization ProcessA process which systematically splits unnormalized complex tables into smaller tables that meet two conditions: all nonkey (secondary) attributes in the table are dependent on the primary key all nonkey attributes are independent of the other nonkey attributes When unnormalized tables are split and reduced to third normal form, they must then be linked together by foreign keys.
  • 25.
    Steps in NormalizationTable with repeating groups First normal form 1NF Second normal form 2NF Third normal form 3NF Higher normal forms Remove repeating groups Remove partial dependencies Remove transitive dependencies Remove remaining anomalies
  • 26.
    Accountants and DataNormalization The update anomaly can generate conflicting and obsolete database values . The insertion anomaly can result in unrecorded transactions and incomplete audit trails . The deletion anomaly can cause the loss of accounting records and the destruction of audit trails. Accountants should have an understanding of the data normalization process and be able to determine whether a database is properly normalized.
  • 27.
    Six Phases inDesigning Relational Databases Identify entities identify the primary entities of the organization construct a data model of their relationships Construct a data model showing entity associations determine the associations between entities model associations into an ER diagram
  • 28.
    Six Phases inDesigning Relational Databases Add primary keys and attributes to the model assign primary keys to all entities in the model to uniquely identify records every attribute should appear in one or more user views Normalize the data model and add foreign keys remove repeating groups, partial and transitive dependencies assign foreign keys to be able to link tables
  • 29.
    Six Phases inDesigning Relational Databases Construct the physical database create physical tables populate tables with data Prepare the user views normalized tables should support all required views of system users user views restrict users from have access to unauthorized data
  • 30.
    Distributed Data Processing Site C Site B Site A Centralized Database Central Site
  • 31.
    Distributed Data ProcessingDP is organized around several information processing units (IPUs) distributed throughout the organization and placed under the control of the end users. DDP does not mean decentralization IPUs are connected to one another and coordinated
  • 32.
    Potential Advantages ofDDP Cost reductions in hardware and data entry tasks Improved cost control responsibility Improved user satisfaction since control is closer to the user level Backup of data can be improved through the use of multiple data storage sites
  • 33.
    Potential Disadvantages ofDDP Loss of control Mismanagement of organization-wide resources Hardware and software incompatibility Redundant tasks and data Consolidating incompatible tasks Difficulty attracting qualified personnel Lack of standards
  • 34.
    The data isretained in a central location. Remote IPUs send requests for data. Central site services the needs of the remote IPUs. The actual processing of the data is performed at the remote IPU. Centralized Databases in DDP Environment
  • 35.
    Data Currency Occursin DDP with a centralized database During transaction processing, the data will temporarily be inconsistent as a record is being read and updated. Database lockout procedures are necessary to keep IPUs from reading inconsistent data and from writing over a transaction being written by another IPU.
  • 36.
    Distributed Databases: PartitioningSplits the central database into segments that are distributed to their primary users Advantages: users’ control is increased by having data stored at local sites transaction processing response time is improved the volume of transmitted data between IPUs is reduced reduces the potential data loss from a disaster
  • 37.
    The Deadlock PhenomenonEspecially a problem with partitioned databases Occurs when multiple sites lock each other out of data that they are currently using One site needs data locked by another site. Special software is needed to analyze and resolve conflicts. Transactions may be terminated and have to be restarted.
  • 38.
    The Deadlock PhenomenonA,B E, F C,D Locked A, waiting for C Locked C, waiting for E Locked E, waiting for A
  • 39.
    Distributed Databases: ReplicationThe duplication of the entire database for multiple IPUs This method is effective for situations with a high degree of data sharing, but no primary user, and supports read-only queries. The data traffic between sites is reduced considerably.
  • 40.
    Concurrency Problems andControl Issues Database concurrency is the presence of complete and accurate data at all IPU sites. With replicated databases, maintaining current data at all locations is a difficult task. Time stamping may be used to serialize transactions and to prevent and resolve any potential conflicts created by updating data at various IPUs.

Editor's Notes

  • #2 1
  • #3 31
  • #5 3 Users access data via computer programs that process the data and present information to the users. Users own their data files . Data redundancy results as multiple applications maintain the same data elements. Files and data elements used in more than one application must be duplicated, which results in data redundancy . As a result of redundancy, the characteristics of data elements and their values are likely to be inconsistent . Outputs usually consist of preprogrammed reports instead of ad-hoc queries provided upon request. This results in inaccessibility of data. Changes to current file-oriented applications cannot be made easily, nor can new developments be quickly realized, which results in inflexibility .
  • #6 4
  • #7 6 Solves the following problems of the flat file approach no data redundancy - except for primary keys, data is only stored once single update current values task-data independence - users have access to the full domain of data available to the firm A database is a set of computer files that minimizes data redundancy and is accessed by one or more application programs for data processing. The database approach to data storage applies whenever a database is established to serve two or more applications, organizational units, or types of users. A database management system (DBMS) is a computer program that enables users to create, modify, and utilize database information efficiently.
  • #10 8
  • #11 10
  • #12 7
  • #13 11
  • #14 12
  • #15 13
  • #18 31
  • #19 35
  • #20 32
  • #21 23
  • #22 33
  • #23 79
  • #24 38
  • #25 38
  • #26 86
  • #27 40
  • #31 23
  • #32 Decentralization does not attempt to integrate the parts into a logical whole unit.