Modern database management   jeffrey a. hoffer, mary b. prescott,
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Modern database management jeffrey a. hoffer, mary b. prescott,

on

  • 15,277 views

Modern Database Management - Jeffrey A. Hoffer, Mary B. Prescott

Modern Database Management - Jeffrey A. Hoffer, Mary B. Prescott

Statistics

Views

Total Views
15,277
Views on SlideShare
15,276
Embed Views
1

Actions

Likes
2
Downloads
546
Comments
0

1 Embed 1

http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 2
  • 3
  • 8
  • 5
  • 6
  • 7
  • 12
  • 14
  • 37
  • 17
  • 16
  • 8
  • 29
  • 22
  • 22
  • 22
  • 1
  • 1
  • 1
  • 40
  • 40
  • 20
  • 21
  • 27
  • 36

Modern database management jeffrey a. hoffer, mary b. prescott, Presentation Transcript

  • 1. Grading System
    • Lecture Grade
      • 1 st Exam - 10% Ch 1 – 2
      • 2 nd Exam - 10% Ch 3 – 5
      • 3 rd Exam - 10% Ch 7 – 8 (SQL)
      • 4 th Exam - 15% Overall
      • Project - 15%
      • Q/A/Etc - 40%
        • TOTAL - 100% * .75
  • 2.
    • Laboratory Grade
      • Laboratory Exercises - 10%
      • Hands – on Exam - 15 %
        • TOTAL - 25%
    • GRADE = LEC + LAB = 75% + 25% = 100%
  • 3. Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 4. Objectives
    • Definition of terms
    • Explain growth and importance of databases
    • Name limitations of conventional file processing
    • Identify five categories of databases
    • Explain advantages of databases
    • Identify costs and risks of databases
    • List components of database environment
    • Describe evolution of database systems
  • 5. Definitions
    • Database: organized collection of logically related data
    • Data: stored representations of meaningful objects and events
      • Structured: numbers, text, dates
      • Unstructured: images, video, documents
    • Information: data processed to increase knowledge in the person using the data
    • Metadata: data that describes the properties and context of user data
  • 6. Figure 1-1a Data in context Context helps users understand data
  • 7. Graphical displays turn data into useful information that managers can use for decision making and interpretation Figure 1-1b Summarized data
  • 8. Descriptions of the properties or characteristics of the data, including data types, field sizes, allowable values, and data context
  • 9. Disadvantages of File Processing
    • Program-Data Dependence
      • All programs maintain metadata for each file they use
    • Duplication of Data
      • Different systems/programs have separate copies of the same data
    • Limited Data Sharing
      • No centralized control of data
    • Lengthy Development Times
      • Programmers must design their own file formats
    • Excessive Program Maintenance
      • 80% of information systems budget
  • 10. Problems with Data Dependency
    • Each application programmer must maintain his/her own data
    • Each application program needs to include code for the metadata of each file
    • Each application program must have its own processing routines for reading, inserting, updating, and deleting data
    • Lack of coordination and central control
    • Non-standard file formats
  • 11. Figure 1-3 Old file processing systems at Pine Valley Furniture Company Duplicate Data
  • 12. Problems with Data Redundancy
    • Waste of space to have duplicate data
    • Causes more maintenance headaches
    • The biggest problem:
      • Data changes in one file could cause inconsistencies
      • Compromises in data integrity
  • 13. SOLUTION: The DATABASE Approach
    • Central repository of shared data
    • Data is managed by a controlling agent
    • Stored in a standardized, convenient form
    Requires a Database Management System (DBMS)
  • 14. Database Management System DBMS manages data resources like an operating system manages hardware resources
    • A software system that is used to create, maintain, and provide controlled access to user databases
    Order Filing System Invoicing System Payroll System DBMS Central database Contains employee, order, inventory, pricing, and customer data
  • 15. Advantages of the Database Approach
    • Program-data independence
    • Planned data redundancy
    • Improved data consistency
    • Improved data sharing
    • Increased application development productivity
    • Enforcement of standards
    • Improved data quality
    • Improved data accessibility and responsiveness
    • Reduced program maintenance
    • Improved decision support
  • 16. Costs and Risks of the Database Approach
    • New, specialized personnel
    • Installation and management cost and complexity
    • Conversion costs
    • Need for explicit backup and recovery
    • Organizational conflict
  • 17. Elements of the Database Approach
    • Data models
      • Graphical system capturing nature and relationship of data
      • Enterprise Data Model–high-level entities and relationships for the organization
      • Project Data Model–more detailed view, matching data structure in database or data warehouse
    • Relational Databases
      • Database technology involving tables (relations) representing entities and primary/foreign keys representing relationships
    • Use of Internet Technology
      • Networks and telecommunications, distributed databases, client-server, and 3-tier architectures
    • Database Applications
      • Application programs used to perform database activities (create, read, update, and delete) for database users
  • 18. Segment of an Enterprise Data Model Segment of a Project-Level Data Model
  • 19. One customer may place many orders, but each order is placed by a single customer  One-to-many relationship
  • 20. One order has many order lines; each order line is associated with a single order  One-to-many relationship
  • 21. One product can be in many order lines, each order line refers to a single product  One-to-many relationship
  • 22. Therefore, one order involves many products and one product is involved in many orders  Many-to-many relationship
  • 23. Figure 1-4 Enterprise data model for Figure 1-3 segments
  • 24. Figure 1-5 Components of the Database Environment
  • 25. Components of the Database Environment
    • CASE Tools – computer-aided software engineering
    • Repository – centralized storehouse of metadata
    • Database Management System (DBMS) – software for managing the database
    • Database – storehouse of the data
    • Application Programs – software using the data
    • User Interface – text and graphical displays to users
    • Data/Database Administrators – personnel responsible for maintaining the database
    • System Developers – personnel responsible for designing databases and software
    • End Users – people who use the applications and databases
  • 26. The Range of Database Applications
    • Personal databases
    • Workgroup databases
    • Departmental/divisional databases
    • Enterprise database
  • 27.
  • 28. Figure 1-6 Typical data from a personal database
  • 29. Figure 1-7 Workgroup database with wireless local area network
  • 30. Enterprise Database Applications
    • Enterprise Resource Planning (ERP)
      • Integrate all enterprise functions (manufacturing, finance, sales, marketing, inventory, accounting, human resources)
    • Data Warehouse
      • Integrated decision support system derived from various operational databases
  • 31. Figure 1-8 An enterprise data warehouse
  • 32. Evolution of DB Systems
  • 33. Chapter 2: The Database Development Process Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 34. Objectives
    • Definition of terms
    • Describe system development life cycle
    • Explain prototyping approach
    • Explain roles of individuals
    • Explain three-schema approach
    • Explain role of packaged data models
    • Explain three-tiered architectures
    • Explain scope of database design projects
    • Draw simple data models
  • 35. Enterprise Data Model
    • First step in database development
    • Specifies scope and general content
    • Overall picture of organizational data at high level of abstraction
    • Entity-relationship diagram
    • Descriptions of entity types
    • Relationships between entities
    • Business rules
  • 36. Figure 2-1 Segment from enterprise data model Enterprise data model describes the high-level entities in an organization and the relationship between these entities
  • 37. Information Systems Architecture (ISA)
    • Conceptual blueprint for organization’s desired information systems structure
    • Consists of:
      • Data (e.g. Enterprise Data Model – simplified ER Diagram)
      • Processes – data flow diagrams, process decomposition, etc.
      • Data Network – topology diagram (like Fig 1-9)
      • People – people management using project management tools (Gantt charts, etc.)
      • Events and points in time (when processes are performed)
      • Reasons for events and rules (e.g., decision tables)
  • 38. Information Engineering
    • A data-oriented methodology to create and maintain information systems
    • Top-down planning–a generic IS planning methodology for obtaining a broad understanding of the IS needed by the entire organization
    • Four steps to Top-Down planning:
      • Planning
      • Analysis
      • Design
      • Implementation
  • 39. Information Systems Planning (Table 2-1)
    • Purpose – align information technology with organization’s business strategies
    • Three steps:
        • Identify strategic planning factors
        • Identify corporate planning objects
        • Develop enterprise model
  • 40. Identify Strategic Planning Factors (Table 2-2)
    • Organization goals–what we hope to accomplish
    • Critical success factors–what MUST work in order for us to survive
    • Problem areas–weaknesses we now have
  • 41. Identify Corporate Planning Objects (Table 2-3)
    • Organizational units–departments
    • Organizational locations
    • Business functions–groups of business processes
    • Entity types–the things we are trying to model for the database
    • Information systems–application programs
  • 42. Develop Enterprise Model
    • Functional decomposition
      • Iterative process breaking system description into finer and finer detail
    • Enterprise data model
    • Planning matrixes
      • Describe interrelationships
      • between planning objects
  • 43. Figure 2-2 Example of process decomposition of an order fulfillment function (Pine Valley Furniture) Decomposition = breaking large tasks into smaller tasks in a hierarchical structure chart
  • 44. Planning Matrixes
    • Describe relationships between planning objects in the organization
    • Types of matrixes:
      • Function-to-data entity
      • Location-to-function
      • Unit-to-function
      • IS-to-data entity
      • Supporting function-to-data entity
      • IS-to-business objective
  • 45. Example business function-to-data entity matrix (Fig. 2-3)
  • 46. Two Approaches to Database and IS Development
    • SDLC
      • System Development Life Cycle
      • Detailed, well-planned development process
      • Time-consuming, but comprehensive
      • Long development cycle
    • Prototyping
      • Rapid application development (RAD)
      • Cursory attempt at conceptual data modeling
      • Define database during development of initial prototype
      • Repeat implementation and maintenance activities with new prototype versions
  • 47. Systems Development Life Cycle (see also Figures 2.4, 2.5) Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 48. Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.) Planning Purpose – preliminary understanding Deliverable – request for study Database activity – enterprise modeling and early conceptual data modeling Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 49. Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.) Analysis Purpose–thorough requirements analysis and structuring Deliverable–functional system specifications Database activity–Thorough and integrated conceptual data modeling Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 50. Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.) Logical Design Purpose–information requirements elicitation and structure Deliverable–detailed design specifications Database activity– logical database design (transactions, forms, displays, views, data integrity and security) Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 51. Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.) Physical Design Purpose–develop technology and organizational specifications Deliverable–program/data structures, technology purchases, organization redesigns Database activity– physical database design (define database to DBMS, physical data organization, database processing programs) Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 52. Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.) Implementation Purpose–programming, testing, training, installation, documenting Deliverable–operational programs, documentation, training materials Database activity– database implementation, including coded programs, documentation, installation and conversion Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 53. Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.) Maintenance Purpose–monitor, repair, enhance Deliverable–periodic audits Database activity– database maintenance, performance analysis and tuning, error corrections Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 54. Prototyping Database Methodology (Figure 2.6)
  • 55. Prototyping Database Methodology (Figure 2.6) (cont.)
  • 56. Prototyping Database Methodology (Figure 2.6) (cont.)
  • 57. Prototyping Database Methodology (Figure 2.6) (cont.)
  • 58. Prototyping Database Methodology (Figure 2.6) (cont.)
  • 59. CASE
    • Computer-Aided Software Engineering (CASE)–software tools providing automated support for systems development
    • Three database features:
      • Data modeling–drawing entity-relationship diagrams
      • Code generation–SQL code for table creation
      • Repositories–knowledge base of enterprise information
  • 60. Packaged Data Models
    • Model components that can be purchased, customized, and assembled into full-scale data models
    • Advantages
      • Reduced development time
      • Higher model quality and reliability
    • Two types:
      • Universal data models
      • Industry-specific data models
  • 61. Managing Projects
    • Project–a planned undertaking of related activities to reach an objective that has a beginning and an end
    • Involves use of review points for:
      • Validation of satisfactory progress
      • Step back from detail to overall view
      • Renew commitment of stakeholders
    • Incremental commitment–review of systems development project after each development phase with rejustification after each phase
  • 62. Managing Projects: People Involved
    • Business analysts
    • Systems analysts
    • Database analysts and data modelers
    • Users
    • Programmers
    • Database architects
    • Data administrators
    • Project managers
    • Other technical experts
  • 63. Database Schema
    • Physical Schema
      • Physical structures–covered in Chapters 5 and 6
    • Conceptual Schema
      • E-R models–covered in Chapters 3 and 4
    • External Schema
      • User Views
      • Subsets of Conceptual Schema
      • Can be determined from business-function/data entity matrices
      • DBA determines schema for different users
  • 64. Different people have different views of the database…these are the external schema The internal schema is the underlying design and implementation Figure 2-7 Three-schema architecture
  • 65. Figure 2-8 Developing the three-tiered architecture
  • 66. Figure 2-9 Three-tiered client/server database architecture
  • 67. Pine Valley Furniture Segment of project data model (Figure 2-11)
  • 68. Figure 2-12 Four relations (Pine Valley Furniture)
  • 69. Figure 2-12 Four relations (Pine Valley Furniture) (cont.)
  • 70. Chapter 3: Modeling Data in the Organization Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 71. Objectives
    • Definition of terms
    • Importance of data modeling
    • Write good names and definitions for entities, relationships, and attributes
    • Distinguish unary, binary, and ternary relationships
    • Model different types of attributes, entities, relationships, and cardinalities
    • Draw E-R diagrams for common business situations
    • Convert many-to-many relationships to associative entities
    • Model time-dependent data using time stamps
  • 72. Business Rules
    • Statements that define or constrain some aspect of the business
    • Assert business structure
    • Control/influence business behavior
    • Expressed in terms familiar to end users
    • Automated through DBMS software
  • 73. A Good Business Rule is:
    • Declarative–what, not how
    • Precise–clear, agreed-upon meaning
    • Atomic–one statement
    • Consistent–internally and externally
    • Expressible–structured, natural language
    • Distinct–non-redundant
    • Business-oriented–understood by business people
  • 74. A Good Data Name is:
    • Related to business, not technical, characteristics
    • Meaningful and self-documenting
    • Unique
    • Readable
    • Composed of words from an approved list
    • Repeatable
  • 75. Data Definitions
    • Explanation of a term or fact
      • Term–word or phrase with specific meaning
      • Fact–association between two or more terms
    • Guidelines for good data definition
      • Gathered in conjunction with systems requirements
      • Accompanied by diagrams
      • Iteratively created and refined
      • Achieved by consensus
  • 76. E-R Model Constructs
    • Entities:
      • Entity instance–person, place, object, event, concept (often corresponds to a row in a table)
      • Entity Type–collection of entities (often corresponds to a table)
    • Relationships:
      • Relationship instance–link between entities (corresponds to primary key-foreign key equivalencies in related tables)
      • Relationship type–category of relationship…link between entity types
    • Attribute– property or characteristic of an entity or relationship type (often corresponds to a field in a table)
  • 77. Sample E-R Diagram (Figure 3-1)
  • 78. Relationship degrees specify number of entity types involved Relationship cardinalities specify how many of each entity type is allowed Basic E-R notation (Figure 3-2) Entity symbols A special entity that is also a relationship Relationship symbols Attribute symbols
  • 79. What Should an Entity Be?
    • SHOULD BE:
      • An object that will have many instances in the database
      • An object that will be composed of multiple attributes
      • An object that we are trying to model
    • SHOULD NOT BE:
      • A user of the database system
      • An output of the database system (e.g., a report)
  • 80. Inappropriate entities Figure 3-4 Example of inappropriate entities System user System output Appropriate entities
  • 81. Attributes
    • Attribute–property or characteristic of an entity or relationahip type
    • Classifications of attributes:
      • Required versus Optional Attributes
      • Simple versus Composite Attribute
      • Single-Valued versus Multivalued Attribute
      • Stored versus Derived Attributes
      • Identifier Attributes
  • 82. Identifiers (Keys)
    • Identifier (Key)–An attribute (or combination of attributes) that uniquely identifies individual instances of an entity type
    • Simple versus Composite Identifier
    • Candidate Identifier–an attribute that could be a key…satisfies the requirements for being an identifier
  • 83. Characteristics of Identifiers
    • Will not change in value
    • Will not be null
    • No intelligent identifiers (e.g., containing locations or people that might change)
    • Substitute new, simple keys for long, composite keys
  • 84. Figure 3-7 A composite attribute An attribute broken into component parts Figure 3-8 Entity with multivalued attribute (Skill) and derived attribute (Years_Employed) Multivalued an employee can have more than one skill Derived from date employed and current date
  • 85. Figure 3-9 Simple and composite identifier attributes The identifier is boldfaced and underlined
  • 86. Figure 3-19 Simple example of time-stamping This attribute that is both multivalued and composite
  • 87. More on Relationships
    • Relationship Types vs. Relationship Instances
      • The relationship type is modeled as lines between entity types…the instance is between specific entity instances
    • Relationships can have attributes
      • These describe features pertaining to the association between the entities in the relationship
    • Two entities can have more than one type of relationship between them (multiple relationships)
    • Associative Entity–combination of relationship and entity
  • 88. Figure 3-10 Relationship types and instances a) Relationship type b) Relationship instances
  • 89. Degree of Relationships
    • Degree of a relationship is the number of entity types that participate in it
      • Unary Relationship
      • Binary Relationship
      • Ternary Relationship
  • 90. Degree of relationships – from Figure 3-2 Entities of two different types related to each other Entities of three different types related to each other One entity related to another of the same entity type
  • 91. Cardinality of Relationships
    • One-to-One
      • Each entity in the relationship will have exactly one related entity
    • One-to-Many
      • An entity on one side of the relationship can have many related entities, but an entity on the other side will have a maximum of one related entity
    • Many-to-Many
      • Entities on both sides of the relationship can have many related entities on the other side
  • 92. Cardinality Constraints
    • Cardinality Constraints - the number of instances of one entity that can or must be associated with each instance of another entity
    • Minimum Cardinality
      • If zero, then optional
      • If one or more, then mandatory
    • Maximum Cardinality
      • The maximum number
  • 93. Figure 3-12 Examples of relationships of different degrees a) Unary relationships
  • 94. Figure 3-12 Examples of relationships of different degrees (cont.) b) Binary relationships
  • 95. Figure 3-12 Examples of relationships of different degrees (cont.) c) Ternary relationship Note: a relationship can have attributes of its own
  • 96. Figure 3-17 Examples of cardinality constraints a) Mandatory cardinalities A patient must have recorded at least one history, and can have many A patient history is recorded for one and only one patient
  • 97. Figure 3-17 Examples of cardinality constraints (cont.) b) One optional, one mandatory An employee can be assigned to any number of projects, or may not be assigned to any at all A project must be assigned to at least one employee, and may be assigned to many
  • 98. Figure 3-17 Examples of cardinality constraints (cont.) a) Optional cardinalities A person is is married to at most one other person, or may not be married at all
  • 99. Entities can be related to one another in more than one way Figure 3-21 Examples of multiple relationships a) Employees and departments
  • 100. Figure 3-21 Examples of multiple relationships (cont.) b) Professors and courses (fixed lower limit constraint) Here, min cardinality constraint is 2
  • 101. Figure 3-15a and 3-15b Multivalued attributes can be represented as relationships simple composite
  • 102. Strong vs. Weak Entities, and Identifying Relationships
    • Strong entities
      • exist independently of other types of entities
      • has its own unique identifier
      • identifier underlined with single-line
    • Weak entity
      • dependent on a strong entity (identifying owner)…cannot exist on its own
      • does not have a unique identifier (only a partial identifier)
      • Partial identifier underlined with double-line
      • Entity box has double line
    • Identifying relationship
      • links strong entities to weak entities
  • 103. Strong entity Weak entity Identifying relationship
  • 104. Associative Entities
    • An entity –has attributes
    • A relationship –links entities together
    • When should a relationship with attributes instead be an associative entity ?
      • All relationships for the associative entity should be many
      • The associative entity could have meaning independent of the other entities
      • The associative entity preferably has a unique identifier, and should also have other attributes
      • The associative entity may participate in other relationships other than the entities of the associated relationship
      • Ternary relationships should be converted to associative entities
  • 105. Figure 3-11a A binary relationship with an attribute Here, the date completed attribute pertains specifically to the employee’s completion of a course…it is an attribute of the relationship
  • 106. Figure 3-11b An associative entity (CERTIFICATE) Associative entity is like a relationship with an attribute, but it is also considered to be an entity in its own right. Note that the many-to-many cardinality between entities in Figure 3-11a has been replaced by two one-to-many relationships with the associative entity.
  • 107. Figure 3-13c An associative entity – bill of materials structure This could just be a relationship with attributes…it’s a judgment call
  • 108. Figure 3-18 Ternary relationship as an associative entity
  • 109. Microsoft Visio Example for E-R diagram Different modeling software tools may have different notation for the same constructs
  • 110. Chapter 4: The Enhanced ER Model and Business Rules Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 111. Objectives
    • Definition of terms
    • Use of supertype/subtype relationships
    • Use of generalization and specialization techniques
    • Specification of completeness and disjointness constraints
    • Develop supertype/subtype hierarchies for realistic business situations
    • Develop entity clusters
    • Explain universal data model
    • Name categories of business rules
    • Define operational constraints graphically and in English
  • 112. Supertypes and Subtypes
    • Subtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings
    • Supertype: A generic entity type that has a relationship with one or more subtypes
    • Attribute Inheritance:
      • Subtype entities inherit values of all attributes of the supertype
      • An instance of a subtype is also an instance of the supertype
  • 113. Figure 4-1 Basic notation for supertype/subtype notation a) EER notation
  • 114. Different modeling tools may have different notation for the same modeling constructs b) Microsoft Visio Notation Figure 4-1 Basic notation for supertype/subtype notation (cont.)
  • 115. Figure 4-2 Employee supertype with three subtypes All employee subtypes will have emp nbr, name, address, and date-hired Each employee subtype will also have its own attributes
  • 116. Relationships and Subtypes
    • Relationships at the supertype level indicate that all subtypes will participate in the relationship
    • The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level
  • 117. Figure 4-3 Supertype/subtype relationships in a hospital Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed
  • 118. Generalization and Specialization
    • Generalization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP
    • Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
  • 119. Figure 4-4 Example of generalization a) Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes
  • 120. Figure 4-4 Example of generalization (cont.) So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes b) Generalization to VEHICLE supertype
  • 121. Figure 4-5 Example of specialization a) Entity type PART Only applies to manufactured parts Applies only to purchased parts
  • 122. b) Specialization to MANUFACTURED PART and PURCHASED PART Created 2 subtypes Figure 4-5 Example of specialization (cont.) Note: multivalued attribute was replaced by an associative entity relationship to another entity
  • 123. Constraints in Supertype/ Completeness Constraint
    • Completeness Constraints : Whether an instance of a supertype must also be a member of at least one subtype
      • Total Specialization Rule: Yes (double line)
      • Partial Specialization Rule: No (single line)
  • 124. Figure 4-6 Examples of completeness constraints a) Total specialization rule A patient must be either an outpatient or a resident patient
  • 125. b) Partial specialization rule Figure 4-6 Examples of completeness constraints (cont.) A vehicle could be a car, a truck, or neither
  • 126. Constraints in Supertype/ Disjointness constraint
    • Disjointness Constraints : Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes
      • Disjoint Rule: An instance of the supertype can be only ONE of the subtypes
      • Overlap Rule: An instance of the supertype could be more than one of the subtypes
  • 127. a) Disjoint rule Figure 4-7 Examples of disjointness constraints A patient can either be outpatient or resident, but not both
  • 128. b) Overlap rule Figure 4-7 Examples of disjointness constraints (cont.) A part may be both purchased and manufactured
  • 129. Constraints in Supertype/ Subtype Discriminators
    • Subtype Discriminator : An attribute of the supertype whose values determine the target subtype(s)
      • Disjoint – a simple attribute with alternative values to indicate the possible subtypes
      • Overlapping – a composite attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
  • 130. Figure 4-8 Introducing a subtype discriminator ( disjoint rule) A simple attribute with different possible values indicating the subtype
  • 131. Figure 4-9 Subtype discriminator ( overlap rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype
  • 132. Figure 4-10 Example of supertype/subtype hierarchy
  • 133. Entity Clusters
    • EER diagrams are difficult to read when there are too many entities and relationships
    • Solution: Group entities and relationships into entity clusters
    • Entity cluster : Set of one or more entity types and associated relationships grouped into a single abstract entity type
  • 134. Figure 4-13a Possible entity clusters for Pine Valley Furniture in Microsoft Visio Related groups of entities could become clusters
  • 135. Figure 4-13b EER diagram of PVF entity clusters More readable, isn’t it?
  • 136. Figure 4-14 Manufacturing entity cluster Detail for a single cluster
  • 137. Chapter 4: The Enhanced ER Model and Business Rules Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 138. Objectives
    • Definition of terms
    • Use of supertype/subtype relationships
    • Use of generalization and specialization techniques
    • Specification of completeness and disjointness constraints
    • Develop supertype/subtype hierarchies for realistic business situations
    • Develop entity clusters
    • Explain universal data model
    • Name categories of business rules
    • Define operational constraints graphically and in English
  • 139. Supertypes and Subtypes
    • Subtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings
    • Supertype: A generic entity type that has a relationship with one or more subtypes
    • Attribute Inheritance:
      • Subtype entities inherit values of all attributes of the supertype
      • An instance of a subtype is also an instance of the supertype
  • 140. Figure 4-1 Basic notation for supertype/subtype notation a) EER notation
  • 141. Different modeling tools may have different notation for the same modeling constructs b) Microsoft Visio Notation Figure 4-1 Basic notation for supertype/subtype notation (cont.)
  • 142. Figure 4-2 Employee supertype with three subtypes All employee subtypes will have emp nbr, name, address, and date-hired Each employee subtype will also have its own attributes
  • 143. Relationships and Subtypes
    • Relationships at the supertype level indicate that all subtypes will participate in the relationship
    • The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level
  • 144. Figure 4-3 Supertype/subtype relationships in a hospital Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed
  • 145. Generalization and Specialization
    • Generalization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP
    • Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
  • 146. Figure 4-4 Example of generalization a) Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes
  • 147. Figure 4-4 Example of generalization (cont.) So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes b) Generalization to VEHICLE supertype
  • 148. Figure 4-5 Example of specialization a) Entity type PART Only applies to manufactured parts Applies only to purchased parts
  • 149. b) Specialization to MANUFACTURED PART and PURCHASED PART Created 2 subtypes Figure 4-5 Example of specialization (cont.) Note: multivalued attribute was replaced by an associative entity relationship to another entity
  • 150. Constraints in Supertype/ Completeness Constraint
    • Completeness Constraints : Whether an instance of a supertype must also be a member of at least one subtype
      • Total Specialization Rule: Yes (double line)
      • Partial Specialization Rule: No (single line)
  • 151. Figure 4-6 Examples of completeness constraints a) Total specialization rule A patient must be either an outpatient or a resident patient
  • 152. b) Partial specialization rule Figure 4-6 Examples of completeness constraints (cont.) A vehicle could be a car, a truck, or neither
  • 153. Constraints in Supertype/ Disjointness constraint
    • Disjointness Constraints : Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes
      • Disjoint Rule: An instance of the supertype can be only ONE of the subtypes
      • Overlap Rule: An instance of the supertype could be more than one of the subtypes
  • 154. a) Disjoint rule Figure 4-7 Examples of disjointness constraints A patient can either be outpatient or resident, but not both
  • 155. b) Overlap rule Figure 4-7 Examples of disjointness constraints (cont.) A part may be both purchased and manufactured
  • 156. Constraints in Supertype/ Subtype Discriminators
    • Subtype Discriminator : An attribute of the supertype whose values determine the target subtype(s)
      • Disjoint – a simple attribute with alternative values to indicate the possible subtypes
      • Overlapping – a composite attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
  • 157. Figure 4-8 Introducing a subtype discriminator ( disjoint rule) A simple attribute with different possible values indicating the subtype
  • 158. Figure 4-9 Subtype discriminator ( overlap rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype
  • 159. Figure 4-10 Example of supertype/subtype hierarchy
  • 160. Entity Clusters
    • EER diagrams are difficult to read when there are too many entities and relationships
    • Solution: Group entities and relationships into entity clusters
    • Entity cluster : Set of one or more entity types and associated relationships grouped into a single abstract entity type
  • 161. Figure 4-13a Possible entity clusters for Pine Valley Furniture in Microsoft Visio Related groups of entities could become clusters
  • 162. Figure 4-13b EER diagram of PVF entity clusters More readable, isn’t it?
  • 163. Figure 4-14 Manufacturing entity cluster Detail for a single cluster
  • 164. Packaged data models provide generic models that can be customized for a particular organization’s business rules
  • 165. Business rules
    • Statements that define or constrain some aspect of the business
    • Classification of business rules:
      • Derivation–rule derived from other knowledge, often in the form of a formula using attribute values
      • Structural assertion–rule expressing static structure. Includes attributes, relationships, and definitions
      • Action assertion–rule expressing constraints/control of organizational actions
  • 166. Figure 4-18 EER diagram to describe business rules
  • 167. Types of Action Assertions
    • Result
      • Condition–IF/THEN rule
      • Integrity constraint–must always be true
      • Authorization–privilege statement
    • Form
      • Enabler–leads to creation of new object
      • Timer–allows or disallows an action
      • Executive–executes one or more actions
    • Rigor
      • Controlling–something must or must not happen
      • Influencing–guideline for which a notification must occur
  • 168. Stating an Action Assertion
    • Anchor Object–an object on which actions are limited
    • Action–creation, deletion, update, or read
    • Corresponding Objects–an object influencing the ability to perform an action on another business rule
    Action assertions identify corresponding objects that constrain the ability to perform actions on anchor objects
  • 169. Figure 4-19 Data model segment for class scheduling
  • 170. Figure 4-20 Business Rule 1: For a faculty member to be assigned to teach a section of a course, the faculty member must be qualified to teach the course for which that section is scheduled Action assertion Anchor object Corresponding object Corresponding object In this case, the action assertion is a R estriction
  • 171. Figure 4-21 Business Rule 2: For a faculty member to be assigned to teach a section of a course, the faculty member must not be assigned to teach a total of more than three course sections Action assertion Anchor object Corresponding object In this case, the action assertion is an U pper LIM it
  • 172. Chapter 5: Logical Database Design and the Relational Model Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 173. Objectives
    • Definition of terms
    • List five properties of relations
    • State two properties of candidate keys
    • Define first, second, and third normal form
    • Describe problems from merging relations
    • Transform E-R and EER diagrams to relations
    • Create tables with entity and relational integrity constraints
    • Use normalization to convert anomalous tables to well-structured relations
  • 174. Relation
    • Definition: A relation is a named, two-dimensional table of data
    • Table consists of rows (records) and columns (attribute or field)
    • Requirements for a table to qualify as a relation:
      • It must have a unique name
      • Every attribute value must be atomic (not multivalued, not composite)
      • Every row must be unique (can’t have two rows with exactly the same values for all their fields)
      • Attributes (columns) in tables must have unique names
      • The order of the columns must be irrelevant
      • The order of the rows must be irrelevant
    • NOTE: all relations are in 1 st Normal form
  • 175. Correspondence with E-R Model
    • Relations (tables) correspond with entity types and with many-to-many relationship types
    • Rows correspond with entity instances and with many-to-many relationship instances
    • Columns correspond with attributes
    • NOTE: The word relation (in relational database) is NOT the same as the word relationship (in E-R model)
  • 176. Key Fields
    • Keys are special fields that serve two main purposes:
      • Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique
      • Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship)
    • Keys can be simple (a single field) or composite (more than one field)
    • Keys usually are used as indexes to speed up the response to user queries (More on this in Ch. 6)
  • 177. Figure 5-3 Schema for four relations (Pine Valley Furniture Company) Primary Key Foreign Key (implements 1:N relationship between customer and order) Combined, these are a composite primary key (uniquely identifies the order line)…individually they are foreign keys (implement M:N relationship between order and product)
  • 178. Integrity Constraints
    • Domain Constraints
      • Allowable values for an attribute. See Table 5-1
    • Entity Integrity
      • No primary key attribute may be null. All primary key fields MUST have data
    • Action Assertions
      • Business rules. Recall from Ch. 4
  • 179. Domain definitions enforce domain integrity constraints
  • 180. Integrity Constraints
    • Referential Integrity–rule states that any foreign key value (on the relation of the many side) MUST match a primary key value in the relation of the one side. (Or the foreign key can be null)
      • For example: Delete Rules
        • Restrict–don’t allow delete of “parent” side if related rows exist in “dependent” side
        • Cascade–automatically delete “dependent” side rows that correspond with the “parent” side row to be deleted
        • Set-to-Null–set the foreign key in the dependent side to null if deleting from the parent side  not allowed for weak entities
  • 181. Figure 5-5 Referential integrity constraints (Pine Valley Furniture) Referential integrity constraints are drawn via arrows from dependent to parent table
  • 182. Figure 5-6 SQL table definitions Referential integrity constraints are implemented with foreign key to primary key references
  • 183. Transforming EER Diagrams into Relations
    • Mapping Regular Entities to Relations
      • Simple attributes: E-R attributes map directly onto the relation
      • Composite attributes: Use only their simple, component attributes
      • Multivalued Attribute–Becomes a separate relation with a foreign key taken from the superior entity
  • 184. (a) CUSTOMER entity type with simple attributes Figure 5-8 Mapping a regular entity (b) CUSTOMER relation
  • 185. (a) CUSTOMER entity type with composite attribute Figure 5-9 Mapping a composite attribute (b) CUSTOMER relation with address detail
  • 186. Figure 5-10 Mapping an entity with a multivalued attribute One–to–many relationship between original entity and new relation (a) Multivalued attribute becomes a separate relation with foreign key (b)
  • 187. Transforming EER Diagrams into Relations (cont.)
    • Mapping Weak Entities
      • Becomes a separate relation with a foreign key taken from the superior entity
      • Primary key composed of:
        • Partial identifier of weak entity
        • Primary key of identifying relation (strong entity)
  • 188. Figure 5-11 Example of mapping a weak entity a) Weak entity DEPENDENT
  • 189. NOTE: the domain constraint for the foreign key should NOT allow null value if DEPENDENT is a weak entity Foreign key Figure 5-11 Example of mapping a weak entity (cont.) b) Relations resulting from weak entity Composite primary key
  • 190. Transforming EER Diagrams into Relations (cont.)
    • Mapping Binary Relationships
      • One-to-Many–Primary key on the one side becomes a foreign key on the many side
      • Many-to-Many–Create a new relation with the primary keys of the two entities as its primary key
      • One-to-One–Primary key on the mandatory side becomes a foreign key on the optional side
  • 191. Figure 5-12 Example of mapping a 1:M relationship a) Relationship between customers and orders Note the mandatory one Again, no null value in the foreign key…this is because of the mandatory minimum cardinality Foreign key b) Mapping the relationship
  • 192. Figure 5-13 Example of mapping an M:N relationship a) Completes relationship (M:N) The Completes relationship will need to become a separate relation
  • 193. New intersection relation Figure 5-13 Example of mapping an M:N relationship (cont.) b) Three resulting relations Foreign key Foreign key Composite primary key
  • 194. Figure 5-14 Example of mapping a binary 1:1 relationship a) In_charge relationship (1:1) Often in 1:1 relationships, one direction is optional.
  • 195. b) Resulting relations Figure 5-14 Example of mapping a binary 1:1 relationship (cont.) Foreign key goes in the relation on the optional side, Matching the primary key on the mandatory side
  • 196. Transforming EER Diagrams into Relations (cont.)
    • Mapping Associative Entities
      • Identifier Not Assigned
        • Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship)
      • Identifier Assigned
        • It is natural and familiar to end-users
        • Default identifier may not be unique
  • 197. Figure 5-15 Example of mapping an associative entity a) An associative entity
  • 198. Figure 5-15 Example of mapping an associative entity (cont.) b) Three resulting relations Composite primary key formed from the two foreign keys
  • 199. Figure 5-16 Example of mapping an associative entity with an identifier a) SHIPMENT associative entity
  • 200. Figure 5-16 Example of mapping an associative entity with an identifier (cont.) b) Three resulting relations Primary key differs from foreign keys
  • 201. Transforming EER Diagrams into Relations (cont.)
    • Mapping Unary Relationships
      • One-to-Many–Recursive foreign key in the same relation
      • Many-to-Many–Two relations:
        • One for the entity type
        • One for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity
  • 202. Figure 5-17 Mapping a unary 1:N relationship (a) EMPLOYEE entity with unary relationship (b) EMPLOYEE relation with recursive foreign key
  • 203. Figure 5-18 Mapping a unary M:N relationship (a) Bill-of-materials relationships (M:N) (b) ITEM and COMPONENT relations
  • 204. Transforming EER Diagrams into Relations (cont.)
    • Mapping Ternary (and n-ary) Relationships
      • One relation for each entity and one for the associative entity
      • Associative entity has foreign keys to each entity in the relationship
  • 205. Figure 5-19 Mapping a ternary relationship a) PATIENT TREATMENT Ternary relationship with associative entity
  • 206. b) Mapping the ternary relationship PATIENT TREATMENT Remember that the primary key MUST be unique Figure 5-19 Mapping a ternary relationship (cont.) This is why treatment date and time are included in the composite primary key But this makes a very cumbersome key… It would be better to create a surrogate key like Treatment#
  • 207. Transforming EER Diagrams into Relations (cont.)
    • Mapping Supertype/Subtype Relationships
      • One relation for supertype and for each subtype
      • Supertype attributes (including identifier and subtype discriminator) go into supertype relation
      • Subtype attributes go into each subtype; primary key of supertype relation also becomes primary key of subtype relation
      • 1:1 relationship established between supertype and each subtype, with supertype as primary table
  • 208. Figure 5-20 Supertype/subtype relationships
  • 209. Figure 5-21 Mapping Supertype/subtype relationships to relations These are implemented as one-to-one relationships
  • 210. Data Normalization
    • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data
    • The process of decomposing relations with anomalies to produce smaller, well-structured relations
  • 211. Well-Structured Relations
    • A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies
    • Goal is to avoid anomalies
      • Insertion Anomaly –adding new rows forces user to create duplicate data
      • Deletion Anomaly –deleting rows may cause a loss of data that would be needed for other future rows
      • Modification Anomaly –changing data in a row forces changes to other rows because of duplication
    General rule of thumb: A table should not pertain to more than one entity type
  • 212. Example–Figure 5-2b Question–Is this a relation? Answer–Yes: Unique rows and no multivalued attributes Question–What’s the primary key? Answer–Composite: Emp_ID, Course_Title
  • 213. Anomalies in this Table
    • Insertion –can’t enter a new employee without having the employee take a class
    • Deletion –if we remove employee 140, we lose information about the existence of a Tax Acc class
    • Modification –giving a salary increase to employee 100 forces us to update multiple records
    • Why do these anomalies exist?
      • Because there are two themes (entity types) in this one relation. This results in data duplication and an unnecessary dependency between the entities
  • 214. Functional Dependencies and Keys
    • Functional Dependency: The value of one attribute (the determinant ) determines the value of another attribute
    • Candidate Key:
      • A unique identifier. One of the candidate keys will become the primary key
        • E.g. perhaps there is both credit card number and SS# in a table…in this case both are candidate keys
      • Each non-key field is functionally dependent on every candidate key
  • 215. Figure 5.22 Steps in normalization
  • 216. First Normal Form
    • No multivalued attributes
    • Every attribute value is atomic
    • Fig. 5-25 is not in 1 st Normal Form (multivalued attributes)  it is not a relation
    • Fig. 5-26 is in 1 st Normal form
    • All relations are in 1 st Normal Form
  • 217. Table with multivalued attributes, not in 1 st normal form Note: this is NOT a relation
  • 218. Table with no multivalued attributes and unique rows, in 1 st normal form Note: this is relation, but not a well-structured one
  • 219. Anomalies in this Table
    • Insertion –if new product is ordered for order 1007 of existing customer, customer data must be re-entered, causing duplication
    • Deletion –if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price
    • Update –changing the price of product ID 4 requires update in several records
    • Why do these anomalies exist?
      • Because there are multiple themes (entity types) in one relation. This results in duplication and an unnecessary dependency between the entities
  • 220. Second Normal Form
    • 1NF PLUS every non-key attribute is fully functionally dependent on the ENTIRE primary key
      • Every non-key attribute must be defined by the entire key, not by only part of the key
      • No partial functional dependencies
  • 221. Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address Therefore, NOT in 2 nd Normal Form Customer_ID  Customer_Name, Customer_Address Product_ID  Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID  Order_Quantity Figure 5-27 Functional dependency diagram for INVOICE
  • 222. Partial dependencies are removed, but there are still transitive dependencies Getting it into Second Normal Form Figure 5-28 Removing partial dependencies
  • 223. Third Normal Form
    • 2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes)
    • Note: This is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third
    • Solution: Non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table
  • 224. Transitive dependencies are removed Figure 5-28 Removing partial dependencies Getting it into Third Normal Form
  • 225. Merging Relations
    • View Integration–Combining entities from multiple ER models into common relations
    • Issues to watch out for when merging entities from different ER models:
      • Synonyms–two or more attributes with different names but same meaning
      • Homonyms–attributes with same name but different meanings
      • Transitive dependencies–even if relations are in 3NF prior to merging, they may not be after merging
      • Supertype/subtype relationships–may be hidden prior to merging
  • 226. Enterprise Keys
    • Primary keys that are unique in the whole database, not just within a single relation
    • Corresponds with the concept of an object ID in object-oriented systems
  • 227. Figure 5-31 Enterprise keys a) Relations with enterprise key b) Sample data with enterprise key
  • 228. Chapter 6: Physical Database Design and Performance Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 229. Objectives
    • Definition of terms
    • Describe the physical database design process
    • Choose storage formats for attributes
    • Select appropriate file organizations
    • Describe three types of file organization
    • Describe indexes and their appropriate use
    • Translate a database model into efficient structures
    • Know when and how to use denormalization
  • 230. Physical Database Design
    • Purpose–translate the logical description of data into the technical specifications for storing and retrieving data
    • Goal–create a design for storing data that will provide adequate performance and insure database integrity , security , and recoverability
  • 231. Physical Design Process
    • Normalized relations
    • Volume estimates
    • Attribute definitions
    • Response time expectations
    • Data security needs
    • Backup/recovery needs
    • Integrity expectations
    • DBMS technology used
    Inputs
    • Attribute data types
    • Physical record descriptions (doesn’t always match logical design)
    • File organizations
    • Indexes and database architectures
    • Query optimization
    Leads to Decisions
  • 232. Figure 6-1 Composite usage map (Pine Valley Furniture Company)
  • 233. Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Data volumes
  • 234. Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Access Frequencies (per hour)
  • 235. Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis: 140 purchased parts accessed per hour  80 quotations accessed from these 140 purchased part accesses  70 suppliers accessed from these 80 quotation accesses
  • 236. Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis: 75 suppliers accessed per hour  40 quotations accessed from these 75 supplier accesses  40 purchased parts accessed from these 40 quotation accesses
  • 237. Designing Fields
    • Field: smallest unit of data in database
    • Field design
      • Choosing data type
      • Coding, compression, encryption
      • Controlling data integrity
  • 238. Choosing Data Types
    • CHAR–fixed-length character
    • VARCHAR2–variable-length character (memo)
    • LONG–large number
    • NUMBER–positive/negative number
    • INEGER–positive/negative whole number
    • DATE–actual date
    • BLOB–binary large object (good for graphics, sound clips, etc.)
  • 239. Figure 6-2 Example code look-up table (Pine Valley Furniture Company) Code saves space, but costs an additional lookup to obtain actual value
  • 240. Field Data Integrity
    • Default value–assumed value if no explicit value
    • Range control–allowable value limitations (constraints or validation rules)
    • Null value control–allowing or prohibiting empty fields
    • Referential integrity–range control (and null value allowances) for foreign-key to primary-key match-ups
    Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity
  • 241. Handling Missing Data
    • Substitute an estimate of the missing value (e.g., using a formula)
    • Construct a report listing missing values
    • In programs, ignore missing data unless the value is significant (sensitivity testing)
    Triggers can be used to perform these operations
  • 242. Physical Records
    • Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit
    • Page: The amount of data read or written in one I/O operation
    • Blocking Factor: The number of physical records per page
  • 243. Denormalization
    • Transforming normalized relations into unnormalized physical record specifications
    • Benefits:
      • Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries )
    • Costs (due to data duplication)
      • Wasted storage space
      • Data integrity/consistency threats
    • Common denormalization opportunities
      • One-to-one relationship (Fig. 6-3)
      • Many-to-many relationship with attributes (Fig. 6-4)
      • Reference data (1:N relationship where 1-side has data not used in any other relationship) (Fig. 6-5)
  • 244. Figure 6-3 A possible denormalization situation: two entities with one-to-one relationship
  • 245. Figure 6-4 A possible denormalization situation: a many-to-many relationship with nonkey attributes Extra table access required Null description possible
  • 246. Figure 6-5 A possible denormalization situation: reference data Extra table access required Data duplication
  • 247. Partitioning
    • Horizontal Partitioning: Distributing the rows of a table into several separate files
      • Useful for situations where different users need access to different rows
      • Three types: Key Range Partitioning, Hash Partitioning, or Composite Partitioning
    • Vertical Partitioning: Distributing the columns of a table into several separate relations
      • Useful for situations where different users need access to different columns
      • The primary key must be repeated in each file
    • Combinations of Horizontal and Vertical
    Partitions often correspond with User Schemas (user views)
  • 248. Partitioning (cont.)
    • Advantages of Partitioning:
      • Efficiency: Records used together are grouped together
      • Local optimization: Each partition can be optimized for performance
      • Security, recovery
      • Load balancing: Partitions stored on different disks, reduces contention
      • Take advantage of parallel processing capability
    • Disadvantages of Partitioning:
      • Inconsistent access speed: Slow retrievals across partitions
      • Complexity: Non-transparent partitioning
      • Extra space or update time: Duplicate data; access from multiple partitions
  • 249. Data Replication
    • Purposely storing the same data in multiple locations of the database
    • Improves performance by allowing multiple users to access the same data at the same time with minimum contention
    • Sacrifices data integrity due to data duplication
    • Best for data that is not updated often
  • 250. Designing Physical Files
    • Physical File:
      • A named portion of secondary memory allocated for the purpose of storing physical records
      • Tablespace–named set of disk storage elements in which physical files for database tables can be stored
      • Extent–contiguous section of disk space
    • Constructs to link two pieces of data:
      • Sequential storage
      • Pointers–field of data that can be used to locate related fields or records
  • 251. Figure 6-4 Physical file terminology in an Oracle environment
  • 252. File Organizations
    • Technique for physically arranging records of a file on secondary storage
    • Factors for selecting file organization:
      • Fast data retrieval and throughput
      • Efficient storage space utilization
      • Protection from failure and data loss
      • Minimizing need for reorganization
      • Accommodating growth
      • Security from unauthorized use
    • Types of file organizations
      • Sequential
      • Indexed
      • Hashed
  • 253. Figure 6-7a Sequential file organization If not sorted Average time to find desired record = n/2 1 2 n Records of the file are stored in sequence by the primary key field values If sorted – every insert or delete requires resort
  • 254. Indexed File Organizations
    • Index–a separate table that contains organization of records for quick retrieval
    • Primary keys are automatically indexed
    • Oracle has a CREATE INDEX operation, and MS ACCESS allows indexes to be created for most field types
    • Indexing approaches:
      • B-tree index, Fig. 6-7b
      • Bitmap index, Fig. 6-8
      • Hash Index, Fig. 6-7c
      • Join Index, Fig 6-9
  • 255. Figure 6-7b B-tree index uses a tree search Average time to find desired record = depth of the tree Leaves of the tree are all at same level  consistent access time
  • 256. Figure 6-7c Hashed file or index organization Hash algorithm Usually uses division-remainder to determine record position. Records with same position are grouped in lists
  • 257. Figure 6-8 Bitmap index index organization Bitmap saves on space requirements Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values
  • 258. Figure 6-9 Join Indexes–speeds up join operations
  • 259.
  • 260. Clustering Files
    • In some relational DBMSs, related records from different tables can be stored together in the same disk area
    • Useful for improving performance of join operations
    • Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table
    • e.g. Oracle has a CREATE CLUSTER command
  • 261. Rules for Using Indexes
    • Use on larger tables
    • Index the primary key of each table
    • Index search fields (fields frequently in WHERE clause)
    • Fields in SQL ORDER BY and GROUP BY commands
    • When there are >100 values but not when there are <30 values
  • 262. Rules for Using Indexes (cont.)
    • Avoid use of indexes for fields with long values; perhaps compress values first
    • DBMS may have limit on number of indexes per table and number of bytes per indexed field(s)
    • Null values will not be referenced from an index
    • Use indexes heavily for non-volatile databases; limit the use of indexes for volatile databases
      • Why? Because modifications (e.g. inserts, deletes) require updates to occur in index files
  • 263. RAID
    • Redundant Array of Inexpensive Disks
    • A set of disk drives that appear to the user to be a single disk drive
    • Allows parallel access to data (improves access speed)
    • Pages are arranged in stripes
  • 264.
    • Figure 6-10
      • RAID with four disks and striping
    Here, pages 1-4 can be read/written simultaneously
  • 265. Raid Types (Figure 6-10)
    • Raid 0
      • Maximized parallelism
      • No redundancy
      • No error correction
      • no fault-tolerance
    • Raid 1
      • Redundant data – fault tolerant
      • Most common form
    • Raid 2
      • No redundancy
      • One record spans across data disks
      • Error correction in multiple disks– reconstruct damaged data
    • Raid 3
      • Error correction in one disk
      • Record spans multiple data disks (more than RAID2)
      • Not good for multi-user environments,
    • Raid 4
      • Error correction in one disk
      • Multiple records per stripe
      • Parallelism, but slow updates due to error correction contention
    • Raid 5
      • Rotating parity array
      • Error correction takes place in same disks as data storage
      • Parallelism, better performance than Raid4
  • 266. Database Architectures (Figure 6-11) Legacy Systems Current Technology Data Warehouses
  • 267. Chapter 7: Introduction to SQL Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 268. Objectives
    • Definition of terms
    • Interpret history and role of SQL
    • Define a database using SQL data definition language
    • Write single table queries using SQL
    • Establish referential integrity using SQL
    • Discuss SQL:1999 and SQL:2003 standards
  • 269. SQL Overview
    • Structured Query Language
    • The standard for relational database management systems (RDBMS)
    • RDBMS: A database management system that manages data as a collection of tables in which all relationships are represented by common values in related tables
  • 270. History of SQL
    • 1970–E. Codd develops relational database concept
    • 1974-1979–System R with Sequel (later SQL) created at IBM Research Lab
    • 1979–Oracle markets first relational DB with SQL
    • 1986–ANSI SQL standard released
    • 1989, 1992, 1999, 2003–Major ANSI standard updates
    • Current–SQL is supported by most major database vendors
  • 271. Purpose of SQL Standard
    • Specify syntax/semantics for data definition and manipulation
    • Define data structures
    • Enable portability
    • Specify minimal (level 1) and complete (level 2) standards
    • Allow for later growth/enhancement to standard
  • 272. Benefits of a Standardized Relational Language
    • Reduced training costs
    • Productivity
    • Application portability
    • Application longevity
    • Reduced dependence on a single vendor
    • Cross-system communication
  • 273. SQL Environment
    • Catalog
      • A set of schemas that constitute the description of a database
    • Schema
      • The structure that contains descriptions of objects created by a user (base tables, views, constraints)
    • Data Definition Language (DDL)
      • Commands that define a database, including creating, altering, and dropping tables and establishing constraints
    • Data Manipulation Language (DML)
      • Commands that maintain and query a database
    • Data Control Language (DCL)
      • Commands that control a database, including administering privileges and committing data
  • 274. Figure 7-1 A simplified schematic of a typical SQL environment, as described by the SQL-2003 standard
  • 275. Some SQL Data types
  • 276. Figure 7-4 DDL, DML, DCL, and the database development process
  • 277. SQL Database Definition
    • Data Definition Language (DDL)
    • Major CREATE statements:
      • CREATE SCHEMA–defines a portion of the database owned by a particular user
      • CREATE TABLE–defines a table and its columns
      • CREATE VIEW–defines a logical table from one or more views
    • Other CREATE statements: CHARACTER SET, COLLATION, TRANSLATION, ASSERTION, DOMAIN
  • 278. Table Creation Figure 7-5 General syntax for CREATE TABLE
    • Steps in table creation:
    • Identify data types for attributes
    • Identify columns that can and cannot be null
    • Identify columns that must be unique (candidate keys)
    • Identify primary key – foreign key mates
    • Determine default values
    • Identify constraints on columns (domain specifications)
    • Create the table and associated indexes
  • 279. The following slides create tables for this enterprise data model
  • 280. Figure 7-6 SQL database definition commands for Pine Valley Furniture Overall table definitions
  • 281. Defining attributes and their data types
  • 282. Non-nullable specification Identifying primary key Primary keys can never have NULL values
  • 283. Non-nullable specifications Primary key Some primary keys are composite– composed of multiple attributes
  • 284. Default value Domain constraint Controlling the values in attributes
  • 285. Primary key of parent table Identifying foreign keys and establishing relationships Foreign key of dependent table
  • 286. Data Integrity Controls
    • Referential integrity–constraint that ensures that foreign key values of a table must match primary key values of a related table in 1:M relationships
    • Restricting:
      • Deletes of primary records
      • Updates of primary records
      • Inserts of dependent records
  • 287. Relational integrity is enforced via the primary-key to foreign-key match Figure 7-7 Ensuring data integrity through updates
  • 288. Changing and Removing Tables
    • ALTER TABLE statement allows you to change column specifications:
      • ALTER TABLE CUSTOMER_T ADD (TYPE VARCHAR(2))
    • DROP TABLE statement allows you to remove tables from your schema:
      • DROP TABLE CUSTOMER_T
  • 289. Schema Definition
    • Control processing/storage efficiency:
      • Choice of indexes
      • File organizations for base tables
      • File organizations for indexes
      • Data clustering
      • Statistics maintenance
    • Creating indexes
      • Speed up random/sequential access to base table data
      • Example
        • CREATE INDEX NAME_IDX ON CUSTOMER_T(CUSTOMER_NAME)
        • This makes an index for the CUSTOMER_NAME field of the CUSTOMER_T table
  • 290. Insert Statement
    • Adds data to a table
    • Inserting into a table
      • INSERT INTO CUSTOMER_T VALUES (001, ‘Contemporary Casuals’, ‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601);
    • Inserting a record that has some null attributes requires identifying the fields that actually get data
      • INSERT INTO PRODUCT_T (PRODUCT_ID, PRODUCT_DESCRIPTION,PRODUCT_FINISH, STANDARD_PRICE, PRODUCT_ON_HAND) VALUES (1, ‘End Table’, ‘Cherry’, 175, 8);
    • Inserting from another table
      • INSERT INTO CA_CUSTOMER_T SELECT * FROM CUSTOMER_T WHERE STATE = ‘CA’;
  • 291. Creating Tables with Identity Columns
    • Inserting into a table does not require explicit customer ID entry or field list
      • INSERT INTO CUSTOMER_T VALUES ( ‘Contemporary Casuals’, ‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601);
    New with SQL:2003
  • 292. Delete Statement
    • Removes rows from a table
    • Delete certain rows
      • DELETE FROM CUSTOMER_T WHERE STATE = ‘HI’;
    • Delete all rows
      • DELETE FROM CUSTOMER_T;
  • 293. Update Statement
    • Modifies data in existing rows
    • UPDATE PRODUCT_T SET UNIT_PRICE = 775 WHERE PRODUCT_ID = 7;
  • 294. Merge Statement Makes it easier to update a table…allows combination of Insert and Update in one statement Useful for updating master tables with new data
  • 295. SELECT Statement
    • Used for queries on single or multiple tables
    • Clauses of the SELECT statement:
      • SELECT
        • List the columns (and expressions) that should be returned from the query
      • FROM
        • Indicate the table(s) or view(s) from which data will be obtained
      • WHERE
        • Indicate the conditions under which a row will be included in the result
      • GROUP BY
        • Indicate categorization of results
      • HAVING
        • Indicate the conditions under which a category (group) will be included
      • ORDER BY
        • Sorts the result according to specified criteria
  • 296.
    • Figure 7-10
      • SQL statement processing order (adapted from van der Lans, p.100)
  • 297. SELECT Example
    • Find products with standard price less than $275
      • SELECT PRODUCT_NAME, STANDARD_PRICE
      • FROM PRODUCT_V
      • WHERE STANDARD_PRICE < 275;
    Table 7-3: Comparison Operators in SQL
  • 298. SELECT Example Using Alias
    • Alias is an alternative column or table name
        • SELECT CUST .CUSTOMER AS NAME , CUST.CUSTOMER_ADDRESS
        • FROM CUSTOMER_V CUST
        • WHERE NAME = ‘Home Furnishings’;
  • 299. SELECT Example Using a Function
    • Using the COUNT aggregate function to find totals
      • SELECT COUNT(*) FROM ORDER_LINE_V
      • WHERE ORDER_ID = 1004;
      • Note: with aggregate functions you can’t have single-valued columns included in the SELECT clause
  • 300. SELECT Example–Boolean Operators
    • AND , OR , and NOT Operators for customizing conditions in WHERE clause
      • SELECT PRODUCT_DESCRIPTION, PRODUCT_FINISH, STANDARD_PRICE
      • FROM PRODUCT_V
      • WHERE (PRODUCT_DESCRIPTION LIKE ‘ % Desk’
      • OR PRODUCT_DESCRIPTION LIKE ‘ % Table’)
      • AND UNIT_PRICE > 300;
    Note: the LIKE operator allows you to compare strings using wildcards. For example, the % wildcard in ‘%Desk’ indicates that all strings that have any number of characters preceding the word “Desk” will be allowed
  • 301. Venn Diagram from Previous Query
  • 302. SELECT Example – Sorting Results with the ORDER BY Clause
    • Sort the results first by STATE, and within a state by CUSTOMER_NAME
      • SELECT CUSTOMER_NAME, CITY, STATE
      • FROM CUSTOMER_V
      • WHERE STATE IN (‘FL’, ‘TX’, ‘CA’, ‘HI’)
      • ORDER BY STATE, CUSTOMER_NAME;
    Note: the IN operator in this example allows you to include rows whose STATE value is either FL, TX, CA, or HI. It is more efficient than separate OR conditions
  • 303. SELECT Example– Categorizing Results Using the GROUP BY Clause
    • For use with aggregate functions
      • Scalar aggregate : single value returned from SQL query with aggregate function
      • Vector aggregate : multiple values returned from SQL query with aggregate function (via GROUP BY)
      • SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE)
      • FROM CUSTOMER_V
      • GROUP BY CUSTOMER_STATE;
      • Note: you can use single-value fields with aggregate functions if they are included in the GROUP BY clause
  • 304. SELECT Example– Qualifying Results by Categories Using the HAVING Clause
    • For use with GROUP BY
      • SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE)
      • FROM CUSTOMER_V
      • GROUP BY CUSTOMER_STATE
      • HAVING COUNT(CUSTOMER_STATE) > 1;
      • Like a WHERE clause, but it operates on groups (categories), not on individual rows. Here, only those groups with total numbers greater than 1 will be included in final result
  • 305. Using and Defining Views
    • Views provide users controlled access to tables
    • Base Table–table containing the raw data
    • Dynamic View
      • A “virtual table” created dynamically upon request by a user
      • No data actually stored; instead data from base table made available to user
      • Based on SQL SELECT statement on base tables or other views
    • Materialized View
      • Copy or replication of data
      • Data actually stored
      • Must be refreshed periodically to match the corresponding base tables
  • 306. Sample CREATE VIEW
      • CREATE VIEW EXPENSIVE_STUFF_V AS
      • SELECT PRODUCT_ID, PRODUCT_NAME, UNIT_PRICE
      • FROM PRODUCT_T
      • WHERE UNIT_PRICE >300
      • WITH CHECK_OPTION;
    • View has a name
    • View is based on a SELECT statement
    • CHECK_OPTION works only for updateable views and prevents updates that would create rows not included in the view
  • 307. Advantages of Views
    • Simplify query commands
    • Assist with data security (but don't rely on views for security, there are more important security measures)
    • Enhance programming productivity
    • Contain most current base table data
    • Use little storage space
    • Provide customized view for user
    • Establish physical data independence
  • 308. Disadvantages of Views
    • Use processing time each time view is referenced
    • May or may not be directly updateable
  • 309. Chapter 8: Advanced SQL Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 310. Objectives
    • Definition of terms
    • Write multiple table SQL queries
    • Define and use three types of joins
    • Write correlated and noncorrelated subqueries
    • Establish referential integrity in SQL
    • Understand triggers and stored procedures
    • Discuss SQL:1999 standard and its extension of SQL-92
  • 311. Processing Multiple Tables–Joins
    • Join – a relational operation that causes two or more tables with a common domain to be combined into a single table or view
    • Equi-join – a join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table
    • Natural join – an equi-join in which one of the duplicate columns is eliminated in the result table
    • Outer join – a join in which rows that do not have matching values in common columns are nonetheless included in the result table (as opposed to inner join, in which rows must have matching values in order to appear in the result table)
    • Union join – includes all columns from each table in the join, and an instance for each row of each table
    The common columns in joined tables are usually the primary key of the dominant table and the foreign key of the dependent table in 1:M relationships
  • 312. The following slides create tables for this enterprise data model
  • 313. These tables are used in queries that follow Figure 8-1 Pine Valley Furniture Company Customer and Order tables with pointers from customers to their orders
  • 314.
    • For each customer who placed an order, what is the customer’s name and order number?
      • SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID
      • FROM CUSTOMER_T NATURAL JOIN ORDER_T ON
      • CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID;
    Natural Join Example Note: from Fig. 1, you see that only 10 Customers have links with orders.  Only 10 rows will be returned from this INNER join. Join involves multiple tables in FROM clause ON clause performs the equality check for common columns of the two tables
  • 315.
    • List the customer name, ID number, and order number for all customers. Include customer information even for customers that do have an order
      • SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID
      • FROM CUSTOMER_T, LEFT OUTER JOIN ORDER_T
      • ON CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID;
    Outer Join Example (Microsoft Syntax) Unlike INNER join, this will include customer rows with no matching order rows LEFT OUTER JOIN syntax with ON causes customer data to appear even if there is no corresponding order data
  • 316. Results Unlike INNER join, this will include customer rows with no matching order rows
  • 317.
    • Assemble all information necessary to create an invoice for order number 1006
      • SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, CUSTOMER_ADDRESS, CITY, SATE, POSTAL_CODE, ORDER_T.ORDER_ID, ORDER_DATE, QUANTITY, PRODUCT_DESCRIPTION, STANDARD_PRICE, (QUANTITY * UNIT_PRICE)
      • FROM CUSTOMER_T, ORDER_T, ORDER_LINE_T, PRODUCT_T
      • WHERE CUSTOMER_T.CUSTOMER_ID = ORDER_LINE.CUSTOMER_ID AND ORDER_T.ORDER_ID = ORDER_LINE_T.ORDER_ID
        • AND ORDER_LINE_T.PRODUCT_ID = PRODUCT_PRODUCT_ID
        • AND ORDER_T.ORDER_ID = 1006;
    Multiple Table Join Example Four tables involved in this join Each pair of tables requires an equality-check condition in the WHERE clause, matching primary keys against foreign keys
  • 318. Figure 8-2 Results from a four-table join From CUSTOMER_T table From ORDER_T table From PRODUCT_T table
  • 319. Processing Multiple Tables Using Subqueries
    • Subquery–placing an inner query (SELECT statement) inside an outer query
    • Options:
      • In a condition of the WHERE clause
      • As a “table” of the FROM clause
      • Within the HAVING clause
    • Subqueries can be:
      • Noncorrelated–executed once for the entire outer query
      • Correlated–executed once for each row returned by the outer query
  • 320.
    • Show all customers who have placed an order
      • SELECT CUSTOMER_NAME FROM CUSTOMER_T
      • WHERE CUSTOMER_ID IN
      • (SELECT DISTINCT CUSTOMER_ID FROM ORDER_T);
    Subquery Example Subquery is embedded in parentheses. In this case it returns a list that will be used in the WHERE clause of the outer query The IN operator will test to see if the CUSTOMER_ID value of a row is included in the list returned from the subquery
  • 321. Correlated vs. Noncorrelated Subqueries
    • Noncorrelated subqueries:
      • Do not depend on data from the outer query
      • Execute once for the entire outer query
    • Correlated subqueries:
      • Make use of data from the outer query
      • Execute once for each row of the outer query
      • Can use the EXISTS operator
  • 322. Figure 8-3a Processing a noncorrelated subquery No reference to data in outer query, so subquery executes once only These are the only customers that have IDs in the ORDER_T table
    • The subquery executes and returns the customer IDs from the ORDER_T table
    • The outer query on the results of the subquery
  • 323.
    • Show all orders that include furniture finished in natural ash
      • SELECT DISTINCT ORDER_ID FROM ORDER_LINE_T
      • WHERE EXISTS
      • (SELECT * FROM PRODUCT_T
      • WHERE PRODUCT_ID = ORDER_LINE_T.PRODUCT_ID
      • AND PRODUCT_FINISH = ‘Natural ash’);
    Correlated Subquery Example The subquery is testing for a value that comes from the outer query The EXISTS operator will return a TRUE value if the subquery resulted in a non-empty set, otherwise it returns a FALSE
  • 324. Figure 8-3b Processing a correlated subquery Subquery refers to outer-query data, so executes once for each row of outer query Note: only the orders that involve products with Natural Ash will be included in the final results
  • 325.
    • Show all products whose standard price is higher than the average price
      • SELECT PRODUCT_DESCRIPTION, STANDARD_PRICE, AVGPRICE
      • FROM
        • (SELECT AVG(STANDARD_PRICE) AVGPRICE FROM PRODUCT_T),
        • PRODUCT_T
        • WHERE STANDARD_PRICE > AVG_PRICE;
    Another Subquery Example The WHERE clause normally cannot include aggregate functions, but because the aggregate is performed in the subquery its result can be used in the outer query’s WHERE clause One column of the subquery is an aggregate function that has an alias name. That alias can then be referred to in the outer query Subquery forms the derived table used in the FROM clause of the outer query
  • 326. Union Queries
    • Combine the output (union of multiple queries) together into a single result table
    First query Second query Combine
  • 327. Conditional Expressions Using Case Syntax
    • This is available with newer versions of SQL, previously not part of the standard
  • 328. Ensuring Transaction Integrity
    • Transaction = A discrete unit of work that must be completely processed or not processed at all
      • May involve multiple updates
      • If any update fails, then all other updates must be cancelled
    • SQL commands for transactions
      • BEGIN TRANSACTION/END TRANSACTION
        • Marks boundaries of a transaction
      • COMMIT
        • Makes all updates permanent
      • ROLLBACK
        • Cancels updates since the last COMMIT
  • 329. Figure 8-5 An SQL Transaction sequence (in pseudocode)
  • 330. Data Dictionary Facilities
    • System tables that store metadata
    • Users usually can view some of these tables
    • Users are restricted from updating them
    • Some examples in Oracle 10g
      • DBA_TABLES–descriptions of tables
      • DBA_CONSTRAINTS–description of constraints
      • DBA_USERS–information about the users of the system
    • Examples in Microsoft SQL Server 2000
      • SYSCOLUMNS–table and column definitions
      • SYSDEPENDS–object dependencies based on foreign keys
      • SYSPERMISSIONS–access permissions granted to users
  • 331. SQL:1999 and SQL:2003 Enhancements/Extensions
    • User-defined data types (UDT)
      • Subclasses of standard types or an object type
    • Analytical functions (for OLAP)
      • CEILING, FLOOR, SQRT, RANK, DENSE_RANK
      • WINDOW–improved numerical analysis capabilities
    • New Data Types
      • BIGINT, MULTISET (collection), XML
    • CREATE TABLE LIKE–create a new table similar to an existing one
    • MERGE
  • 332.
    • Persistent Stored Modules (SQL/PSM)
      • Capability to create and drop code modules
      • New statements:
        • CASE, IF, LOOP, FOR, WHILE, etc.
        • Makes SQL into a procedural language
    • Oracle has propriety version called PL/SQL, and Microsoft SQL Server has Transact/SQL
    SQL:1999 and SQL:2003 Enhancements/Extensions (cont.)
  • 333. Routines and Triggers
    • Routines
      • Program modules that execute on demand
      • Functions –routines that return values and take input parameters
      • Procedures –routines that do not return values and can take input or output parameters
    • Triggers
      • Routines that execute in response to a database event (INSERT, UPDATE, or DELETE)
  • 334. Figure 8-6 Triggers contrasted with stored procedures Procedures are called explicitly Triggers are event-driven Source : adapted from Mullins, 1995.
  • 335. Figure 8-7 Simplified trigger syntax, SQL:2003 Figure 8-8 Create routine syntax, SQL:2003
  • 336. Embedded and Dynamic SQL
    • Embedded SQL
      • Including hard-coded SQL statements in a program written in another language such as C or Java
    • Dynamic SQL
      • Ability for an application program to generate SQL code on the fly, as the application is running
  • 337. Chapter 9: The Client/Server Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 338. Objectives
    • Definition of terms
    • List advantages of client/server architecture
    • Explain three application components: presentation, processing, and storage
    • Suggest partitioning possibilities
    • Distinguish between file server, database server, 3-tier, and n-tier approaches
    • Describe and discuss middleware
    • Explain database linking via ODBC and JDBC
  • 339. Client/Server Systems
    • Networked computing model
    • Processes distributed between clients and servers
    • Client–Workstation (usually a PC) that requests and uses a service
    • Server–Computer (PC/mini/mainframe) that provides a service
    • For DBMS, server is a database server
  • 340. Application Logic in C/S Systems GUI Interface Procedures, functions, programs DBMS activities
    • Processing Logic
      • I/O processing
      • Business rules
      • Data management
    • Storage Logic
      • Data storage/retrieval
    • Presentation Logic
      • Input–keyboard/mouse
      • Output–monitor/printer
  • 341. Client/Server Architectures
    • File Server Architecture
    • Database Server Architecture
    • Three-tier Architecture
    Client does extensive processing Client does little processing
  • 342. File Server Architecture
    • All processing is done at the PC that requested the data
    • Entire files are transferred from the server to the client for processing
    • Problems:
      • Huge amount of data transfer on the network
      • Each client must contain full DBMS
        • Heavy resource demand on clients
        • Client DBMSs must recognize shared locks, integrity checks, etc.
    FAT CLIENT
  • 343. Figure 9-2 File Server Architecture FAT CLIENT
  • 344. Two-Tier Database Server Architectures
    • Client is responsible for
      • I/O processing logic
      • Some business rules logic
    • Server performs all data storage and access processing
      •  DBMS is only on server
  • 345. Advantages of Two-Tier Approach
    • Clients do not have to be as powerful
    • Greatly reduces data traffic on the network
    • Improved data integrity since it is all processed centrally
    • Stored procedures  DBMS code that performs some business rules done on server
  • 346. Advantages of Stored Procedures
    • Compiled SQL statements
    • Reduced network traffic
    • Improved security
    • Improved data integrity
    • Thinner clients
  • 347. Figure 9-3 Two-tier database server architecture Thinner clients DBMS only on server
  • 348. Three-Tier Architectures
    • Thin Client
      • PC just for user interface and a little application processing. Limited or no data storage (sometimes no hard drive)
    GUI interface (I/O processing) Browser Business rules Web Server Data storage DBMS Client
      • Application server
      • Database server
  • 349. Figure 9-4 Three-tier architecture Thinnest clients Business rules on separate server DBMS only on DB server
  • 350. Advantages of Three-Tier Architectures
    • Scalability
    • Technological flexibility
    • Long-term cost reduction
    • Better match of systems to business needs
    • Improved customer service
    • Competitive advantage
    • Reduced risk
  • 351. Application Partitioning
    • Placing portions of the application code in different locations (client vs. server) AFTER it is written
    • Advantages
      • Improved performance
      • Improved interoperability
      • Balanced workloads
  • 352. Common Logic Distributions Figure 9-5a Two-tier client-server environment Figure 9-5b n -tier client-server environment Processing logic could be at client, server, or both Processing logic will be at application server or Web server
  • 353. Role of the Mainframe
    • Mission-critical legacy systems have tended to remain on mainframes
    • Distributed client/server systems tend to be used for smaller, workgroup systems
    • Difficulties in moving mission critical systems from mainframe to distributed
      • Determining which code belongs on server vs. client
      • Identifying potential conflicts with code from other applications
      • Ensuring sufficient resources exist for anticipated load
    • Rule of thumb
      • Mainframe for centralized data that does not need to be moved
      • Client for data requiring frequent user access, complex graphics, and user interface
  • 354. Middleware
    • Software that allows an application to interoperate with other software
    • No need for programmer/user to understand internal processing
    • Accomplished via Application Program Interface (API)
    The “glue” that holds client/server applications together
  • 355. Types of Middleware
    • Remote Procedure Calls (RPC)
      • client makes calls to procedures running on remote computers
      • synchronous and asynchronous
    • Message-Oriented Middleware (MOM)
      • asynchronous calls between the client via message queues
    • Publish/Subscribe
      • push technology  server sends information to client when available
    • Object Request Broker (ORB)
      • object-oriented management of communications between clients and servers
    • SQL-oriented Data Access
      • middleware between applications and database servers
  • 356. Database Middleware
    • ODBC –Open Database Connectivity
      • Most DB vendors support this
    • OLE-DB
      • Microsoft enhancement of ODBC
    • JDBC –Java Database Connectivity
      • Special Java classes that allow Java applications/applets to connect to databases
  • 357. Client/Server Security
    • Network environment  complex security issues
    • Security levels:
      • System-level password security
        • for allowing access to the system
      • Database-level password security
        • for determining access privileges to tables; read/update/insert/delete privileges
      • Secure client/server communication
        • via encryption
  • 358. Keys to Successful Client-Server Implementation
    • Accurate business problem analysis
    • Detailed architecture analysis
    • Architecture analysis before choosing tools
    • Appropriate scalability
    • Appropriate placement of services
    • Network analysis
    • Awareness of hidden costs
    • Establish client/server security
  • 359. Benefits of Moving to Client/Server Architecture
    • Staged delivery of functionality speeds deployment
    • GUI interfaces ease application use
    • Flexibility and scalability facilitates business process reengineering
    • Reduced network traffic due to increased processing at data source
    • Facilitation of Web-enabled applications
  • 360. Using ODBC to Link External Databases Stored on a Database Server
    • Open Database Connectivity (ODBC)
      • API provides a common language for application programs to access and process SQL databases independent of the particular RDBMS that is accessed
    • Required parameters:
      • ODBC driver
      • Back-end server name
      • Database name
      • User id and password
    • Additional information:
      • Data source name (DSN)
      • Windows client computer name
      • Client application program’s executable name
    Java Database Connectivity (JDBC) is similar to ODBC–built specifically for Java applications
  • 361. ODBC Architecture (Figure 9-6) Each DBMS has its own ODBC-compliant driver Client does not need to know anything about the DBMS Application Program Interface (API) provides common interface to all DBMSs
  • 362. Chapter 10: The Internet Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 363. Objectives
    • Definition of terms
    • Explain the importance of attaching a database to a Web page
    • Describe necessary environment for Internet and Intranet database connectivity
    • Use Internet terminology appropriately
    • Explain the purpose of WWW Consortium
    • Explain the purpose of server-side extensions
    • Describe Web services
    • Compare Web server interfaces (CGI, API, Java servlets)
    • Decribe Web load balancing methods
    • Explain plug-ins
    • Explain the purpose of XML as a standard
  • 364. Web Characterstics that Support Web-Based Database Applications
    • Web browsers are simple to use
    • Information transfer can take place across different platforms
    • Development time and cost have been reduced
    • Sites can be static (no database) or dynamic/interactive (with database)
    • Potential e-business advantages (improved customer service, faster market time, better supply chain management)
  • 365. Figure 10-1 Database-enabled intranet/internet environment
  • 366. Internet and Intranet Services
    • Web server
    • Database-enabled services
    • Directory, security, authentication
    • E-mail
    • File Transfer Protocol (FTP)
    • Firewalls and proxy servers
    • News or discussion groups
    • Document search
    • Load balancing and caching
  • 367. World Wide Web Consortium (W3C)
    • An international consortium of companies working to develop open standards that foster the development of Web conventions so that Web documents can be consistently displayed on all platforms
    • See www.w3c.org
  • 368. Web-Related Terms
    • World Wide Web (WWW)
      • The total set of interlinked hypertext documents residing on Web servers worldwide
    • Browser
      • Software that displays HTML documents and allows users to access files and software related to HTML documents
    • Web Server
      • Software that responds to requests from browsers and transmits HTML documents to browsers
    • Web pages–HTML documents
      • Static Web pages–content established at development time
      • Dynamic Web pages–content dynamically generated, usually by obtaining data from database
  • 369. Communications Technology
    • IP Address
      • Four numbers that identify a node on the Internet
      • e.g. 131.247.152.18
    • Hypertext Transfer Protocol (HTTP)
      • Communication protocol used to transfer pages from Web server to browser
      • HTTPS is a more secure version
    • Uniform Resource Locator (URL)
      • Mnemonic Web address corresponding with IP address
      • Also includes folder location and html file name
    Typical URL
  • 370. Internet-Related Languages
    • Hypertext Markup Language (HTML)
      • Markup language specifically for Web pages
    • Standard Generalized Markup Language (SGML)
      • Markup language standard
    • Extensible Markup Language (XML)
      • Markup language allowing customized tags
    • XHTML
      • XML-compliant extension of HTML
    • Java
      • Object-oriented programming language for applets
    • JavaScript/VBScript
      • Scripting languages that enable interactivity in HTML documents
    • Cascading Style Sheets (CSS)
      • Control appearance of Web elements in an HML document
    • XSL and XSLT
      • XMS style sheet and transformation to HTML
    Standards and Web conventions established by World Wide Web Consortium (W3C)
  • 371. XML Overview
    • Becoming the standard for E-Commerce data exchange
    • A markup language (like HTML)
      • Uses elements, tags, attributes
      • Includes document type declarations (DTDs), XML schemas, comments, and entity references
    • XML Schema (XSD) replacing DTDs
    • Relax NG–ISO standard XML database definition
    • Document Structure Description (DSD)– expressive, easy to use XML database definition
  • 372. Sample XML Schema Schema is a record definition, analogous to the Create SQL statement, and therefore provides metadata
  • 373. Sample XML Document Data XML data involves elements and attributes defined in the schema, and is analogous to inserting a record into a database.
  • 374. Server-Side Extensions
    • Programs that interact directly with Web servers to handle requests
    • e.g. database-request handling middleware
    Figure 10-2 Web-to-database middleware
  • 375. Web Server Interfaces
    • Common Gateway Interface (CGI)
      • Specify transfer of information between Web server and CGI program
      • Performance not very good
      • Security risks
    • Application Program Interface (API)
      • More efficient than CGI
      • Shared as dynamic link libraries (DLLs)
    • Java Servlets
      • Like applets, but stored at server
      • Cross-platform compatible
      • More efficient than CGI
  • 376. Web Servers
    • Provide HTTP service
    • Passing plain text via TCP connection
    • Serve many clients at once
      • Therefore, multithreaded and multiprocessed
    • Load balancing approaches:
      • Domain Name Server (DNS) balancing
        • One DNS = multiple IP addresses
      • Software/hardware balancing
        • Request at one IP address is distributed to multiple servers
      • Reverse proxy
        • Intercept client request and cache response
  • 377. Client-Side Extensions
    • Add functionality to the browser
    • Plug-ins
      • Hardware/software modules that extend browser capabilities by adding features (e.g. encryption, animation, wireless access)
    • ActiveX
      • Microsoft COM/OLE components that allow data manipulation inside the browser
    • Cookies
      • Block of data stored at client by Web server for later use
  • 378. Components for Dynamic Web Sites
    • DBMS–Oracle, Microsoft SQL Server, Informix, Sybase, DB2, Microsoft Access, MySQL
    • Web server–Apache, Microsoft IIS
    • Programming languages/development technologies–ASP .NET, PHP, ColdFusion, Coral Web Builder, Macromedia’s Dreamweaver
    • Web browser–Microsoft Internet Explorer, Netscape Navigator, Mozilla Firefox, Apple’s Safari, Opera
    • Text editor–Notepad, BBEdit, vi, or an IDE
    • FTP capabilities–SmartFTP, WS_FTP
  • 379. Figure 10-3 Dynamic Web development environment
  • 380. Figure 10-4 Sample PHP script that accepts user registration input a) PHP script initiation and input validation (Ullman, PHP and MySql for Dynamic Web Sites, 2003, Script 6.6)
  • 381. Figure 10-4a (cont.)
  • 382. Figure 10-4 Sample PHP script that accepts user registration input b) Adding user information to the database
  • 383. Figure 10-4 Sample PHP script that accepts user registration input c) Close PHP script and display HTML form
  • 384. Web Services
    • XML-based standards that define protocols for automatic communication between applications over the Web.
    • Web Service Components:
      • Universal Description, Discovery, and Integration (UDDI)
        • Technical specification for distributed registries of Web services and businesses open to communication on these services
      • Web Services Description Language (WSDL)
        • XML-based grammar for describing Web services and providing public interfaces for these services
      • Simple Object Access Protocol (SOAP)
        • XML-based communication protocol for sending messages between applications via the Internet
    • Challenges for Web Services
      • Lack of mature standards
      • Lack of security
  • 385. Figure 10-5 A typical order entry system that uses Web services (adapted from Newcomer 2002, Figure 1-3) Figure 10-6 Web services protocol stack
  • 386. Figure 10-7 Web services deployment (adapted from Newcomer, 2002)
  • 387. Service Oriented Architectures
    • Collection of services that communicate with each other by passing data
    • Web services, CORBA, Java, XML, SOAP, WSDL
    • Loosely coupled
    • Interoperable
    • Using SOA results in increased software development efficiency (up to 40%)
  • 388. Semantic Web
    • W3C project using Web metadata to automate collection of knowledge and storing in easily understood format
    • Structuring based on:
      • XML
      • Resource Description Framewok (RDF)
      • Web Ontology Language (OWL)
  • 389. Rapidly Accelerating Internet Changes
    • Integrated database environments
    • Use of cell phones and PDAs
    • Changes in organizational relationships
    • Globalization
    • Challenges to IT personnel require:
      • Business and technology infrastructure understanding
      • Leadership and communication skills
      • Upward influence techniques
      • Employee management techniques
  • 390. Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 391. Objectives
    • Definition of terms
    • Reasons for information gap between information needs and availability
    • Reasons for need of data warehousing
    • Describe three levels of data warehouse architectures
    • List four steps of data reconciliation
    • Describe two components of star schema
    • Estimate fact table size
    • Design a data mart
  • 392. Definition
    • Data Warehouse :
      • A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
      • Subject-oriented: e.g. customers, patients, students, products
      • Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources
      • Time-variant: Can study trends and changes
      • Nonupdatable: Read-only, periodically refreshed
    • Data Mart :
      • A data warehouse that is limited in scope
  • 393. Need for Data Warehousing
    • Integrated, company-wide view of high-quality information (from disparate databases)
    • Separation of operational and informational systems and data (for improved performance)
  • 394. Source : adapted from Strange (1997).
  • 395. Data Warehouse Architectures
    • Generic Two-Level Architecture
    • Independent Data Mart
    • Dependent Data Mart and Operational Data Store
    • Logical Data Mart and Real-Time Data Warehouse
    • Three-Layer architecture
    All involve some form of extraction , transformation and loading ( ETL )
  • 396. Figure 11-2: Generic two-level data warehousing architecture E T L One, company-wide warehouse Periodic extraction  data is not completely current in warehouse
  • 397. Figure 11-3 Independent data mart data warehousing architecture Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts
  • 398. Figure 11-4 Dependent data mart with operational data store: a three-level architecture E T L Single ETL for enterprise data warehouse (EDW) Simpler data access ODS provides option for obtaining current data Dependent data marts loaded from EDW
  • 399. Figure 11-5 Logical data mart and real time warehouse architecture E T L Near real-time ETL for Data Warehouse ODS and data warehouse are one and the same Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts
  • 400. Figure 11-6 Three-layer data architecture for a data warehouse
  • 401. Data Characteristics Status vs. Event Data Event = a database action (create/update/delete) that results from a transaction Figure 11-7 Example of DBMS log entry Status Status
  • 402. Data Characteristics Transient vs. Periodic Data With transient data, changes to existing records are written over previous records, thus destroying the previous data content Figure 11-8 Transient operational data
  • 403. Periodic data are never physically altered or deleted once they have been added to the store Data Characteristics Transient vs. Periodic Data Figure 11-9: Periodic warehouse data
  • 404. Other Data Warehouse Changes
    • New descriptive attributes
    • New business activity attributes
    • New classes of descriptive attributes
    • Descriptive attributes become more refined
    • Descriptive data are related to one another
    • New source of data
  • 405. The Reconciled Data Layer
    • Typical operational data is:
      • Transient–not historical
      • Not normalized (perhaps due to denormalization for performance)
      • Restricted in scope–not comprehensive
      • Sometimes poor quality–inconsistencies and errors
    • After ETL, data should be:
      • Detailed–not summarized yet
      • Historical–periodic
      • Normalized–3 rd normal form or higher
      • Comprehensive–enterprise-wide perspective
      • Timely–data should be current enough to assist decision-making
      • Quality controlled–accurate with full integrity
  • 406. The ETL Process
    • Capture/Extract
    • Scrub or data cleansing
    • Transform
    • Load and Index
    ETL = Extract, transform, and load
  • 407. Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 11-10: Steps in data reconciliation
  • 408. Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Figure 11-10: Steps in data reconciliation (cont.)
  • 409. Transform = convert data from format of operational system to format of data warehouse Record-level: Selection –data partitioning Joining –data combining Aggregation –data summarization Field-level: single-field –from one field to one field multi-field –from many fields to one, or one field to many Figure 11-10: Steps in data reconciliation (cont.)
  • 410. Load/Index= place transformed data into the warehouse and create indexes Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse Figure 11-10: Steps in data reconciliation (cont.)
  • 411. Figure 11-11: Single-field transformation In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup –another approach, uses a separate table keyed by source record code
  • 412. Figure 11-12: Multifield transformation M:1–from many source fields to one target field 1:M–from one source field to many target fields
  • 413. Derived Data
    • Objectives
      • Ease of use for decision support applications
      • Fast response to predefined user queries
      • Customized data for particular target audiences
      • Ad-hoc query support
      • Data mining capabilities
    • Characteristics
      • Detailed (mostly periodic) data
      • Aggregate (for summary)
      • Distributed (to departmental servers)
    Most common data model = star schema (also called “dimensional model”)
  • 414. Figure 11-13 Components of a star schema Fact tables contain factual or quantitative data Dimension tables contain descriptions about the subjects of the business 1:N relationship between dimension tables and fact tables Excellent for ad-hoc queries, but bad for online transaction processing Dimension tables are denormalized to maximize performance
  • 415. Figure 11-14 Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions
  • 416. Figure 11-15 Star schema with sample data
  • 417. Issues Regarding Star Schema
    • Dimension table keys must be surrogate (non-intelligent and non-business related), because:
      • Keys may change over time
      • Length/format consistency
    • Granularity of Fact Table–what level of detail do you want?
      • Transactional grain–finest level
      • Aggregated grain–more summarized
      • Finer grains  better market basket analysis capability
      • Finer grain  more dimension tables, more rows in fact table
    • Duration of the database–how much history should be kept?
      • Natural duration–13 months or 5 quarters
      • Financial institutions may need longer duration
      • Older data is more difficult to source and cleanse
  • 418. Figure 11-16: Modeling dates Fact tables contain time-period data  Date dimensions are important
  • 419. The User Interface Metadata (data catalog)
    • Identify subjects of the data mart
    • Identify dimensions and facts
    • Indicate how data is derived from enterprise data warehouses, including derivation rules
    • Indicate how data is derived from operational data store, including derivation rules
    • Identify available reports and predefined queries
    • Identify data analysis techniques (e.g. drill-down)
    • Identify responsible people
  • 420. On-Line Analytical Processing (OLAP) Tools
    • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques
    • Relational OLAP (ROLAP)
      • Traditional relational representation
    • Multidimensional OLAP (MOLAP)
      • Cube structure
    • OLAP Operations
      • Cube slicing –come up with 2-D view of data
      • Drill-down –going from summary to more detailed views
  • 421.
      • Figure 11-23 Slicing a data cube
  • 422. Figure 11-24 Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells
  • 423. Data Mining and Visualization
    • Knowledge discovery using a blend of statistical, AI, and computer graphics techniques
    • Goals:
      • Explain observed events or conditions
      • Confirm hypotheses
      • Explore data for new or unexpected relationships
    • Techniques
      • Statistical regression
      • Decision tree induction
      • Clustering and signal processing
      • Affinity
      • Sequence association
      • Case-based reasoning
      • Rule discovery
      • Neural nets
      • Fractals
    • Data visualization–representing data in graphical/multimedia formats for analysis
  • 424. Chapter 12: Data and Database Administration Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 425. Objectives
    • Definition of terms
    • List functions and roles of data/database administration
    • Describe role of data dictionaries and information repositories
    • Compare optimistic and pessimistic concurrency control
    • Describe problems and techniques for data security
    • Describe problems and techniques for data recovery
    • Describe database tuning issues and list areas where changes can be done to tune the database
    • Describe importance and measures of data quality
    • Describe importance and measures of data availability
  • 426. Traditional Administration Definitions
    • Data Administration : A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards
    • Database Administration : A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery
  • 427. Traditional Data Administration Functions
    • Data policies, procedures, standards
    • Planning
    • Data conflict (ownership) resolution
    • Managing the information repository
    • Internal marketing of DA concepts
  • 428. Traditional Database Administration Functions
    • Selection of DBMS and software tools
    • Installing/upgrading DBMS
    • Tuning database performance
    • Improving query processing performance
    • Managing data security, privacy, and integrity
    • Data backup and recovery
  • 429. Evolving Approaches to Data Administration
    • Blend data and database administration into one role
    • Fast-track development – monitoring development process (analysis, design, implementation, maintenance)
    • Procedural DBAs–managing quality of triggers and stored procedures
    • eDBA–managing Internet-enabled database applications
    • PDA DBA–data synchronization and personal database management
    • Data warehouse administration
  • 430. Data Warehouse Administration
    • New role, coming with the growth in data warehouses
    • Similar to DA/DBA roles
    • Emphasis on integration and coordination of metadata/data across many data sources
    • Specific roles:
      • Support DSS applications
      • Manage data warehouse growth
      • Establish service level agreements regarding data warehouses and data marts
  • 431. Open Source DBMSs
    • An alternative to proprietary packages such as Oracle, Microsoft SQL Server, or Microsoft Access
    • mySQL is an example of open-source DBMS
    • Less expensive than proprietary packages
    • Source code available, for modification
  • 432. Figure 12-2 Data modeling responsibilities
  • 433. Database Security
    • Database Security: Protection of the data against accidental or intentional loss, destruction, or misuse
    • Increased difficulty due to Internet access and client/server technologies
  • 434. Figure 12-3 Possible locations of data security threats
  • 435. Threats to Data Security
    • Accidental losses attributable to:
      • Human error
      • Software failure
      • Hardware failure
    • Theft and fraud
    • Improper data access:
      • Loss of privacy (personal data)
      • Loss of confidentiality (corporate data)
    • Loss of data integrity
    • Loss of availability (through, e.g. sabotage)
  • 436. Figure 12-4 Establishing Internet Security
  • 437. Web Security
    • Static HTML files are easy to secure
      • Standard database access controls
      • Place Web files in protected directories on server
    • Dynamic pages are harder
      • Control of CGI scripts
      • User authentication
      • Session security
      • SSL for encryption
      • Restrict number of users and open ports
      • Remove unnecessary programs
  • 438. W3C Web Privacy Standard
    • Platform for Privacy Protection (P3P)
    • Addresses the following:
      • Who collects data
      • What data is collected and for what purpose
      • Who is data shared with
      • Can users control access to their data
      • How are disputes resolved
      • Policies for retaining data
      • Where are policies kept and how can they be accessed
  • 439. Database Software Security Features
    • Views or subschemas
    • Integrity controls
    • Authorization rules
    • User-defined procedures
    • Encryption
    • Authentication schemes
    • Backup, journalizing, and checkpointing
  • 440. Views and Integrity Controls
    • Views
      • Subset of the database that is presented to one or more users
      • User can be given access privilege to view without allowing access privilege to underlying tables
    • Integrity Controls
      • Protect data from unauthorized use
      • Domains–set allowable values
      • Assertions–enforce database conditions
  • 441. Authorization Rules
    • Controls incorporated in the data management system
    •  Restrict:
      • access to data
      • actions that people can take on data
    •  Authorization matrix for:
      • Subjects
      • Objects
      • Actions
      • Constraints
  • 442. Figure 12-5 Authorization matrix
  • 443. Some DBMSs also provide capabilities for user-defined procedures to customize the authorization process Figure 12-6a Authorization table for subjects (salespeople) Figure 12-6b Authorization table for objects (orders) Figure 12-7 Oracle privileges Implementing authorization rules
  • 444. Encryption – the coding or scrambling of data so that humans cannot read them Secure Sockets Layer (SSL) is a popular encryption scheme for TCP/IP connections Figure 12-8 Basic two-key encryption
  • 445. Authentication Schemes
    • Goal – obtain a positive identification of the user
    • Passwords: First line of defense
      • Should be at least 8 characters long
      • Should combine alphabetic and numeric data
      • Should not be complete words or personal information
      • Should be changed frequently
  • 446. Authentication Schemes (cont.)
    • Strong Authentication
      • Passwords are flawed:
        • Users share them with each other
        • They get written down, could be copied
        • Automatic logon scripts remove need to explicitly type them in
        • Unencrypted passwords travel the Internet
    • Possible solutions:
      • Two factor–e.g. smart card plus PIN
      • Three factor–e.g. smart card, biometric, PIN
      • Biometric devices–use of fingerprints, retinal scans, etc. for positive ID
      • Third-party mediated authentication–using secret keys, digital certificates
  • 447. Security Policies and Procedures
    • Personnel controls
      • Hiring practices, employee monitoring, security training
    • Physical access controls
      • Equipment locking, check-out procedures, screen placement
    • Maintenance controls
      • Maintenance agreements, access to source code, quality and availability standards
    • Data privacy controls
      • Adherence to privacy legislation, access rules
  • 448. Database Recovery
    • Mechanism for restoring a database quickly and accurately after loss or damage
    • Recovery facilities:
      • Backup Facilities
      • Journalizing Facilities
      • Checkpoint Facility
      • Recovery Manager
  • 449. Back-up Facilities
    • Automatic dump facility that produces backup copy of the entire database
    • Periodic backup (e.g. nightly, weekly)
    • Cold backup–database is shut down during backup
    • Hot backup–selected portion is shut down and backed up at a given time
    • Backups stored in secure, off-site location
  • 450. Journalizing Facilities
    • Audit trail of transactions and database updates
    • Transaction log–record of essential data for each transaction processed against the database
    • Database change log–images of updated data
      • Before-image–copy before modification
      • After-image–copy after modification
    Produces an audit trail
  • 451. Figure 12-9 Database audit trail From the backup and logs, databases can be restored in case of damage or loss
  • 452. Checkpoint Facilities
    • DBMS periodically refuses to accept new transactions
    •  system is in a quiet state
    • Database and transaction logs are synchronized
    This allows recovery manager to resume processing from short period, instead of repeating entire day
  • 453. Recovery and Restart Procedures
    • Disk Mirroring–switch between identical copies of databases
    • Restore/Rerun–reprocess transactions against the backup
    • Transaction Integrity–commit or abort all transaction changes
    • Backward Recovery (Rollback)–apply before images
    • Forward Recovery (Roll Forward)–apply after images (preferable to restore/rerun)
  • 454. Transaction ACID Properties
    • Atomic
      • Transaction cannot be subdivided
    • Consistent
      • Constraints don’t change from before transaction to after transaction
    • Isolated
      • Database changes not revealed to users until after transaction has completed
    • Durable
      • Database changes are permanent
  • 455. Figure 12-10 Basic recovery techniques a) Rollback
  • 456. Figure 12-10 Basic recovery techniques (cont.) b) Rollforward
  • 457. Database Failure Responses
    • Aborted transactions
      • Preferred recovery: rollback
      • Alternative: Rollforward to state just prior to abort
    • Incorrect data
      • Preferred recovery: rollback
      • Alternative 1: rerun transactions not including inaccurate data updates
      • Alternative 2: compensating transactions
    • System failure (database intact)
      • Preferred recovery: switch to duplicate database
      • Alternative 1: rollback
      • Alternative 2: restart from checkpoint
    • Database destruction
      • Preferred recovery: switch to duplicate database
      • Alternative 1: rollforward
      • Alternative 2: reprocess transactions
  • 458. Concurrency Control
    • Problem –in a multiuser environment, simultaneous access to data can result in interference and data loss
    • Solution – Concurrency Control
      • The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment
  • 459. Figure 12-11 Lost update (no concurrency control in effect) Simultaneous access causes updates to cancel each other A similar problem is the inconsistent read problem
  • 460. Concurrency Control Techniques
    • Serializability
      • Finish one transaction before starting another
    • Locking Mechanisms
      • The most common way of achieving serialization
      • Data that is retrieved for the purpose of updating is locked for the updater
      • No other user can perform update until unlocked
  • 461. Figure 12-12: Updates with locking (concurrency control) This prevents the lost update problem
  • 462. Locking Mechanisms
    • Locking level:
      • Database–used during database updates
      • Table–used for bulk updates
      • Block or page–very commonly used
      • Record–only requested row; fairly commonly used
      • Field–requires significant overhead; impractical
    • Types of locks:
      • Shared lock–Read but no update permitted. Used when just reading to prevent another user from placing an exclusive lock on the record
      • Exclusive lock–No access permitted. Used when preparing to update
  • 463. Deadlock
    • An impasse that results when two or more transactions have locked common resources, and each waits for the other to unlock their resources
    Figure 12-13 The problem of deadlock John and Marsha will wait forever for each other to release their locked resources!
  • 464. Managing Deadlock
    • Deadlock prevention:
      • Lock all records required at the beginning of a transaction
      • Two-phase locking protocol
        • Growing phase
        • Shrinking phase
      • May be difficult to determine all needed resources in advance
    • Deadlock Resolution:
      • Allow deadlocks to occur
      • Mechanisms for detecting and breaking them
        • Resource usage matrix
  • 465. Versioning
    • Optimistic approach to concurrency control
    • Instead of locking
    • Assumption is that simultaneous updates will be infrequent
    • Each transaction can attempt an update as it wishes
    • The system will reject an update when it senses a conflict
    • Use of rollback and commit for this
  • 466. Figure 12-15 The use of versioning Better performance than locking
  • 467. Managing Data Quality
    • Causes of poor data quality
      • External data sources
      • Redundant data storage
      • Lack of organizational commitment
    • Data quality improvement
      • Perform data quality audit
      • Establish data stewardship program (data steward is a liaison between IT and business units)
      • Apply total quality management (TQM) practices
      • Overcome organizational barriers
      • Apply modern DBMS technology
      • Estimate return on investment
  • 468. Data Dictionaries and Repositories
    • Data dictionary
      • Documents data elements of a database
    • System catalog
      • System-created database that describes all database objects
    • Information Repository
      • Stores metadata describing data and data processing resources
    • Information Repository Dictionary System (IRDS)
      • Software tool managing/controlling access to information repository
  • 469. Figure 12-16 Three components of the repository system architecture A schema of the repository information Software that manages the repository objects Where repository objects are stored Source : adapted from Bernstein, 1996.
  • 470. Database Performance Tuning
    • DBMS Installation
      • Setting installation parameters
    • Memory Usage
      • Set cache levels
      • Choose background processes
    • Input/Output (I/O) Contention
      • Use striping
      • Distribution of heavily accessed files
    • CPU Usage
      • Monitor CPU load
    • Application tuning
      • Modification of SQL code in applications
  • 471. Data Availability
    • Downtime is expensive
    • How to ensure availability
      • Hardware failures–provide redundancy for fault tolerance
      • Loss of data–database mirroring
      • Maintenance downtime–automated and nondisruptive maintenance utilities
      • Network problems–careful traffic monitoring, firewalls, and routers