Grading System Lecture Grade 1 st  Exam - 10% Ch 1 – 2  2 nd  Exam - 10% Ch 3 – 5 3 rd  Exam - 10% Ch 7 – 8 (SQL) 4 th  Exam - 15% Overall Project - 15% Q/A/Etc - 40% TOTAL - 100% *  .75
Laboratory Grade Laboratory Exercises - 10% Hands – on Exam - 15 % TOTAL - 25% GRADE = LEC + LAB = 75% + 25% = 100%
Chapter 1: The Database Environment Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Explain growth and importance of databases Name limitations of conventional file processing Identify five categories of databases Explain advantages of databases Identify costs and risks of databases List components of database environment Describe evolution of database systems
Definitions Database: organized collection of logically related data Data: stored representations of meaningful objects and events Structured: numbers, text, dates Unstructured: images, video, documents Information: data processed to increase knowledge in the person using the data Metadata: data that describes the properties and context of user data
Figure 1-1a Data in context Context helps users understand data
Graphical displays turn data into useful information that managers can use for decision making and interpretation Figure 1-1b Summarized data
Descriptions of the properties or characteristics of the data, including data types, field sizes, allowable values, and data context
Disadvantages of File Processing Program-Data Dependence All programs maintain metadata for each file they use Duplication of Data Different systems/programs have separate copies of the same data Limited Data Sharing No centralized control of data Lengthy Development Times Programmers must design their own file formats Excessive Program Maintenance 80% of information systems budget
Problems with Data Dependency Each application programmer must maintain his/her own data Each application program needs to include code for the metadata of each file Each application program must have its own processing routines for reading, inserting, updating, and deleting data Lack of coordination and central control Non-standard file formats
Figure 1-3 Old file processing systems at Pine Valley Furniture Company Duplicate Data
Problems with Data Redundancy Waste of space to have duplicate data Causes more maintenance headaches The biggest problem:  Data changes in one file could cause inconsistencies Compromises in  data integrity
SOLUTION:  The DATABASE Approach Central repository of shared data Data is managed by a controlling agent Stored in a standardized, convenient form Requires a Database Management System (DBMS)
Database Management System DBMS manages data resources like an operating system manages hardware resources A software system that is used to create, maintain, and provide controlled access to user databases Order Filing System Invoicing System Payroll System DBMS Central database Contains employee, order, inventory,  pricing, and  customer data
Advantages of the Database Approach Program-data independence Planned data redundancy Improved data consistency Improved data sharing Increased application development productivity Enforcement of standards Improved data quality Improved data accessibility and responsiveness Reduced program maintenance Improved decision support
Costs and Risks of the Database Approach New, specialized personnel Installation and management cost and complexity Conversion costs Need for explicit backup and recovery Organizational conflict
Elements of the Database Approach Data models  Graphical system capturing nature and relationship of data Enterprise Data Model–high-level entities and relationships for the organization Project Data Model–more detailed view, matching data structure in database or data warehouse  Relational Databases Database technology involving tables (relations) representing entities and primary/foreign keys representing relationships Use of Internet Technology Networks and telecommunications, distributed databases, client-server, and 3-tier architectures Database Applications Application programs used to perform database activities (create, read, update, and delete) for database users
Segment of an Enterprise Data Model Segment of a Project-Level Data Model
One customer may place many orders, but each order is placed by a single customer    One-to-many relationship
One order has many order lines; each order line is associated with a single order    One-to-many relationship
One product can be in many order lines, each order line refers to a single product    One-to-many relationship
Therefore, one order involves many products and one product is involved in many orders    Many-to-many relationship
Figure 1-4 Enterprise data model for Figure 1-3 segments
Figure 1-5 Components of the Database Environment
Components of the  Database Environment CASE Tools – computer-aided software engineering Repository – centralized storehouse of metadata Database Management System (DBMS)  – software for managing the database Database – storehouse of the data Application Programs – software using the data User Interface – text and graphical displays to users Data/Database Administrators – personnel responsible for maintaining the database System Developers – personnel responsible for designing databases and software End Users – people who use the applications and databases
The Range of Database Applications Personal databases Workgroup databases Departmental/divisional databases Enterprise database
Figure 1-6 Typical data from a personal database
Figure 1-7 Workgroup database with wireless  local area network
Enterprise Database Applications Enterprise Resource Planning (ERP) Integrate all enterprise functions (manufacturing, finance, sales, marketing, inventory, accounting, human resources) Data Warehouse Integrated decision support system derived from various operational databases
Figure 1-8 An enterprise data warehouse
Evolution of DB Systems
Chapter 2:  The Database Development Process  Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Describe system development life cycle Explain prototyping approach Explain roles of individuals Explain three-schema approach Explain role of packaged data models Explain three-tiered architectures Explain scope of database design projects Draw simple data models
Enterprise Data Model First step in database development Specifies scope and general content Overall picture of organizational data at high level of abstraction Entity-relationship diagram Descriptions of entity types Relationships between entities Business rules
Figure 2-1 Segment from enterprise data model Enterprise data model describes the high-level entities in an organization and the relationship between these entities
Information Systems Architecture (ISA) Conceptual blueprint for organization’s desired information systems structure Consists of: Data (e.g. Enterprise Data Model – simplified ER Diagram) Processes – data flow diagrams, process decomposition, etc. Data Network – topology diagram (like Fig 1-9) People – people management using project management tools (Gantt charts, etc.) Events and points in time (when processes are performed) Reasons for events and rules (e.g., decision tables)
Information Engineering A data-oriented methodology to create and maintain information systems Top-down planning–a generic IS planning methodology for obtaining a broad understanding of the IS needed by the entire organization Four steps to Top-Down planning: Planning Analysis Design Implementation
Information Systems Planning  (Table 2-1) Purpose – align information technology with organization’s business strategies Three steps: Identify strategic planning factors  Identify corporate planning objects Develop enterprise model
Identify Strategic Planning Factors (Table 2-2) Organization goals–what we hope to accomplish Critical success factors–what MUST work in order for us to survive Problem areas–weaknesses we now have
Identify Corporate Planning Objects (Table 2-3) Organizational units–departments Organizational locations Business functions–groups of business processes Entity types–the things we are trying to model for the database Information systems–application programs
Develop Enterprise Model Functional decomposition Iterative process breaking system description into finer and finer detail Enterprise data model  Planning matrixes  Describe interrelationships  between planning objects
Figure 2-2 Example of process decomposition of an order fulfillment function (Pine Valley Furniture) Decomposition = breaking large tasks into smaller tasks in a hierarchical structure chart
Planning Matrixes Describe relationships between planning objects in the organization Types of matrixes: Function-to-data entity Location-to-function Unit-to-function IS-to-data entity Supporting function-to-data entity IS-to-business objective
Example business function-to-data entity matrix (Fig. 2-3)
Two Approaches to Database and IS Development SDLC System Development Life Cycle Detailed, well-planned development process Time-consuming, but comprehensive Long development cycle Prototyping Rapid application development (RAD) Cursory attempt at conceptual data modeling Define database during development of initial prototype Repeat implementation and maintenance activities with new prototype versions
Systems Development Life Cycle (see also Figures 2.4, 2.5)  Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figures 2.4, 2.5)  (cont.) Planning Purpose – preliminary understanding Deliverable – request for study  Database activity –   enterprise modeling and early conceptual data modeling Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.)  Analysis Purpose–thorough requirements analysis and structuring Deliverable–functional system specifications Database activity–Thorough and integrated conceptual data modeling Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.)  Logical Design Purpose–information requirements elicitation and structure Deliverable–detailed design specifications Database activity–  logical database design (transactions, forms, displays, views, data integrity and security) Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.)  Physical Design Purpose–develop technology and organizational specifications Deliverable–program/data structures, technology purchases, organization redesigns Database activity–  physical database design (define database to DBMS, physical data organization, database processing programs) Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.)  Implementation Purpose–programming, testing, training, installation, documenting Deliverable–operational programs, documentation, training materials Database activity–  database implementation, including coded programs, documentation, installation and conversion Planning Analysis Physical Design Implementation Maintenance Logical Design
Systems Development Life Cycle (see also Figures 2.4, 2.5) (cont.)  Maintenance Purpose–monitor, repair, enhance Deliverable–periodic audits Database activity–  database maintenance, performance analysis and tuning, error corrections Planning Analysis Physical Design Implementation Maintenance Logical Design
Prototyping Database Methodology (Figure 2.6)
Prototyping Database Methodology (Figure 2.6)  (cont.)
Prototyping Database Methodology (Figure 2.6)  (cont.)
Prototyping Database Methodology (Figure 2.6)  (cont.)
Prototyping Database Methodology (Figure 2.6)  (cont.)
CASE Computer-Aided Software Engineering (CASE)–software tools providing automated support for systems development Three database features: Data modeling–drawing entity-relationship diagrams Code generation–SQL code for table creation Repositories–knowledge base of enterprise information
Packaged Data Models Model components that can be purchased, customized, and assembled into full-scale data models Advantages Reduced development time Higher model quality and reliability Two types: Universal data models Industry-specific data models
Managing Projects Project–a planned undertaking of related activities to reach an objective that has a beginning and an end Involves use of review points for: Validation of satisfactory progress Step back from detail to overall view Renew commitment of stakeholders Incremental commitment–review of systems development project after each development phase with rejustification after each phase
Managing Projects: People Involved Business analysts Systems analysts Database analysts and data modelers Users Programmers Database architects Data administrators Project managers Other technical experts
Database Schema Physical Schema  Physical structures–covered in Chapters 5 and 6 Conceptual Schema E-R models–covered in Chapters 3 and 4 External Schema User Views Subsets of Conceptual Schema Can be determined from business-function/data entity matrices DBA determines schema for different users
Different people have different views of the database…these are the external schema The internal schema is the underlying design and implementation Figure 2-7 Three-schema architecture
Figure 2-8 Developing the three-tiered architecture
Figure 2-9 Three-tiered client/server database architecture
Pine Valley Furniture Segment of project data model  (Figure 2-11)
Figure 2-12 Four relations (Pine Valley Furniture)
Figure 2-12 Four relations (Pine Valley Furniture) (cont.)
Chapter 3: Modeling Data in the Organization Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Importance of data modeling Write good names and definitions for entities, relationships, and attributes Distinguish unary, binary, and ternary relationships Model different types of attributes, entities, relationships, and cardinalities Draw E-R diagrams for common business situations Convert many-to-many relationships to associative entities Model time-dependent data using time stamps
Business Rules Statements that define or constrain some aspect of the business Assert business structure Control/influence business behavior Expressed in terms familiar to end users Automated through DBMS software
A Good Business Rule is: Declarative–what, not how Precise–clear, agreed-upon meaning Atomic–one statement Consistent–internally and externally Expressible–structured, natural language Distinct–non-redundant Business-oriented–understood by business people
A Good Data Name is: Related to business, not technical, characteristics Meaningful and self-documenting Unique Readable Composed of words from an approved list Repeatable
Data Definitions Explanation of a term or fact Term–word or phrase with specific meaning Fact–association between two or more terms Guidelines for good data definition Gathered in conjunction with systems requirements Accompanied by diagrams Iteratively created and refined Achieved by consensus
E-R Model Constructs Entities: Entity instance–person, place, object, event, concept (often corresponds to a row in a table) Entity Type–collection of entities (often corresponds to a table) Relationships: Relationship instance–link between entities (corresponds to primary key-foreign key equivalencies in related tables) Relationship type–category of relationship…link between entity types Attribute– property or characteristic of an entity or relationship type (often corresponds to a field in a table)
Sample E-R Diagram (Figure 3-1)
Relationship degrees specify number of entity types involved Relationship cardinalities specify how many of each entity type is allowed Basic E-R notation (Figure 3-2) Entity symbols A special entity that is also a relationship Relationship symbols Attribute symbols
What Should an Entity Be? SHOULD BE: An object that will have many instances in the database An object that will be composed of multiple attributes An object that we are trying to model SHOULD NOT BE: A user of the database system  An output of the database system (e.g., a report)
Inappropriate entities Figure 3-4 Example of inappropriate entities System  user System output Appropriate entities
Attributes Attribute–property or characteristic of an entity or relationahip type Classifications of attributes: Required versus Optional Attributes Simple versus Composite Attribute Single-Valued versus Multivalued Attribute Stored versus Derived Attributes Identifier Attributes
Identifiers (Keys) Identifier (Key)–An attribute (or combination of attributes) that uniquely identifies individual instances of an entity type Simple versus Composite Identifier Candidate Identifier–an attribute that could be a key…satisfies the requirements for being an identifier
Characteristics of Identifiers Will not change in value Will not be null No intelligent identifiers (e.g., containing locations or people that might change) Substitute new, simple keys for long, composite keys
Figure 3-7  A  composite  attribute An attribute broken into component parts Figure 3-8  Entity with  multivalued  attribute (Skill)  and  derived  attribute (Years_Employed) Multivalued an employee can have  more than one skill Derived from date employed and current date
Figure 3-9 Simple and composite identifier attributes The identifier is boldfaced and underlined
Figure 3-19  Simple example of time-stamping This attribute that is both multivalued  and  composite
More on Relationships Relationship Types vs. Relationship Instances The relationship type is modeled as lines between entity types…the instance is between specific entity instances Relationships can have attributes These describe features pertaining to the association between the entities in the relationship Two entities can have more than one type of relationship between them (multiple relationships) Associative Entity–combination of relationship and entity
Figure 3-10 Relationship types and instances a) Relationship type b) Relationship instances
Degree of Relationships Degree of a relationship is the number of entity types that participate in it Unary Relationship Binary Relationship Ternary Relationship
Degree of relationships – from Figure 3-2 Entities of two different types related to each other Entities of three different types related to each other One entity related to another of the same entity type
Cardinality of Relationships One-to-One Each entity in the relationship will have exactly one related entity One-to-Many An entity on one side of the relationship can have many related entities, but an entity on the other side will have a maximum of one related entity Many-to-Many Entities on both sides of the relationship can have many related entities on the other side
Cardinality Constraints Cardinality Constraints - the number of instances of one entity that can or must be associated with each instance of another entity Minimum Cardinality If zero, then optional If one or more, then mandatory Maximum Cardinality The maximum number
Figure 3-12 Examples of relationships of different degrees a) Unary relationships
Figure 3-12 Examples of relationships of different degrees (cont.) b) Binary relationships
Figure 3-12 Examples of relationships of different degrees (cont.) c) Ternary relationship Note: a relationship can have attributes of its own
Figure 3-17 Examples of cardinality constraints a) Mandatory cardinalities A patient must have recorded at least one history, and can have many A patient history is recorded for one and only one patient
Figure 3-17 Examples of cardinality constraints (cont.) b) One optional, one mandatory An employee can be assigned to any number of projects, or may not be assigned to any at all A project must be assigned to at least one employee, and may be assigned to many
Figure 3-17 Examples of cardinality constraints (cont.) a) Optional cardinalities A person is is married to at most one other person, or may not be married at all
Entities can be related to one another in more than one way Figure 3-21 Examples of multiple relationships a) Employees and departments
Figure 3-21 Examples of multiple relationships (cont.) b) Professors and courses (fixed lower limit constraint) Here, min cardinality constraint is 2
Figure 3-15a and 3-15b Multivalued attributes can be represented as relationships simple composite
Strong vs. Weak Entities, and Identifying Relationships Strong entities  exist independently of other types of entities has its own unique identifier identifier underlined with single-line Weak entity dependent on a strong entity (identifying owner)…cannot exist on its own does not have a unique identifier (only a partial identifier) Partial identifier underlined with double-line Entity box has double line Identifying relationship links strong entities to weak entities
Strong entity Weak entity Identifying relationship
Associative Entities An  entity –has attributes A  relationship –links entities together When should a  relationship with attributes  instead be an  associative entity ?  All relationships for the associative entity should be many The associative entity could have meaning independent of the other entities The associative entity preferably has a unique identifier, and should also have other attributes The associative entity may participate in other relationships other than the entities of the associated relationship Ternary relationships should be converted to associative entities
Figure 3-11a A binary relationship with an attribute Here, the date completed attribute pertains specifically to the employee’s completion of a course…it is an attribute of the  relationship
Figure 3-11b An associative entity (CERTIFICATE) Associative entity is like a relationship with an attribute, but it is also considered to be an entity in its own right. Note that the many-to-many cardinality between entities in Figure 3-11a has been replaced by two one-to-many relationships with the associative entity.
Figure 3-13c An associative entity – bill of materials structure This could just be a relationship with attributes…it’s a judgment call
Figure 3-18 Ternary relationship as an associative entity
Microsoft Visio Example for E-R diagram Different modeling software tools may have different notation for the same constructs
Chapter 4: The Enhanced ER Model and Business Rules Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Use of supertype/subtype relationships Use of generalization and specialization techniques Specification of completeness and disjointness constraints Develop supertype/subtype hierarchies for realistic business situations Develop entity clusters Explain universal data model Name categories of business rules Define operational constraints graphically and in English
Supertypes and Subtypes Subtype:   A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings Supertype:   A generic entity type that has a relationship with one or more subtypes Attribute Inheritance: Subtype entities inherit values of all attributes of the supertype An instance of a subtype is also an instance of the supertype
Figure 4-1 Basic notation for supertype/subtype notation a) EER  notation
Different modeling tools may have different notation for the same modeling constructs  b) Microsoft Visio Notation Figure 4-1 Basic notation for supertype/subtype notation (cont.)
Figure 4-2  Employee supertype with three subtypes All employee subtypes will have emp nbr, name, address, and date-hired Each employee subtype will also have its own attributes
Relationships and Subtypes Relationships at the  supertype  level indicate that all subtypes will participate in the relationship The instances of a  subtype  may participate in a relationship unique to that subtype.  In this situation, the relationship is shown at the subtype level
Figure 4-3 Supertype/subtype relationships in a hospital Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed
Generalization and Specialization Generalization:  The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization:  The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
Figure 4-4 Example of generalization a) Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes
Figure 4-4 Example of generalization (cont.) So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes b) Generalization to VEHICLE supertype
Figure 4-5 Example of specialization a) Entity type PART Only applies to manufactured parts Applies only to purchased parts
b) Specialization to MANUFACTURED PART and PURCHASED PART Created 2 subtypes Figure 4-5 Example of specialization (cont.) Note: multivalued attribute was replaced by an associative entity relationship to another entity
Constraints in Supertype/ Completeness Constraint Completeness Constraints : Whether an instance of a supertype  must  also be a member of at least one subtype Total Specialization Rule: Yes (double line) Partial Specialization Rule: No (single line)
Figure 4-6 Examples of completeness constraints a) Total  specialization rule A patient must be either an outpatient or a resident patient
b) Partial specialization rule Figure 4-6 Examples of completeness constraints (cont.) A vehicle could be a car, a truck, or neither
Constraints in Supertype/ Disjointness constraint Disjointness Constraints :  Whether an instance of a supertype may  simultaneously  be a member of two (or more) subtypes Disjoint Rule: An instance of the supertype can be only ONE of the subtypes Overlap Rule: An instance of the supertype could be more than one of the subtypes
a) Disjoint rule Figure 4-7 Examples of disjointness constraints A patient can either be outpatient or resident, but not both
b) Overlap rule Figure 4-7 Examples of disjointness constraints (cont.) A part may be both purchased and manufactured
Constraints in Supertype/ Subtype Discriminators Subtype Discriminator : An attribute of the supertype whose values determine the target subtype(s) Disjoint  – a  simple  attribute with alternative values to indicate the possible subtypes Overlapping  – a  composite  attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
Figure 4-8 Introducing a subtype discriminator ( disjoint  rule) A simple attribute with different possible values indicating the subtype
Figure 4-9 Subtype discriminator ( overlap  rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype
Figure 4-10 Example of supertype/subtype hierarchy
Entity Clusters EER diagrams are difficult to read when there are too many entities and relationships Solution: Group entities and relationships into  entity clusters Entity cluster : Set of one or more entity types and associated relationships grouped into a single abstract entity type
Figure 4-13a  Possible entity clusters for Pine Valley Furniture in Microsoft Visio Related groups of entities could become clusters
Figure 4-13b EER diagram of PVF entity clusters More readable, isn’t it?
Figure 4-14 Manufacturing entity cluster Detail for a single cluster
Chapter 4: The Enhanced ER Model and Business Rules Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Use of supertype/subtype relationships Use of generalization and specialization techniques Specification of completeness and disjointness constraints Develop supertype/subtype hierarchies for realistic business situations Develop entity clusters Explain universal data model Name categories of business rules Define operational constraints graphically and in English
Supertypes and Subtypes Subtype:   A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings Supertype:   A generic entity type that has a relationship with one or more subtypes Attribute Inheritance: Subtype entities inherit values of all attributes of the supertype An instance of a subtype is also an instance of the supertype
Figure 4-1 Basic notation for supertype/subtype notation a) EER  notation
Different modeling tools may have different notation for the same modeling constructs  b) Microsoft Visio Notation Figure 4-1 Basic notation for supertype/subtype notation (cont.)
Figure 4-2  Employee supertype with three subtypes All employee subtypes will have emp nbr, name, address, and date-hired Each employee subtype will also have its own attributes
Relationships and Subtypes Relationships at the  supertype  level indicate that all subtypes will participate in the relationship The instances of a  subtype  may participate in a relationship unique to that subtype.  In this situation, the relationship is shown at the subtype level
Figure 4-3 Supertype/subtype relationships in a hospital Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed
Generalization and Specialization Generalization:  The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization:  The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
Figure 4-4 Example of generalization a) Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes
Figure 4-4 Example of generalization (cont.) So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes b) Generalization to VEHICLE supertype
Figure 4-5 Example of specialization a) Entity type PART Only applies to manufactured parts Applies only to purchased parts
b) Specialization to MANUFACTURED PART and PURCHASED PART Created 2 subtypes Figure 4-5 Example of specialization (cont.) Note: multivalued attribute was replaced by an associative entity relationship to another entity
Constraints in Supertype/ Completeness Constraint Completeness Constraints : Whether an instance of a supertype  must  also be a member of at least one subtype Total Specialization Rule: Yes (double line) Partial Specialization Rule: No (single line)
Figure 4-6 Examples of completeness constraints a) Total  specialization rule A patient must be either an outpatient or a resident patient
b) Partial specialization rule Figure 4-6 Examples of completeness constraints (cont.) A vehicle could be a car, a truck, or neither
Constraints in Supertype/ Disjointness constraint Disjointness Constraints :  Whether an instance of a supertype may  simultaneously  be a member of two (or more) subtypes Disjoint Rule: An instance of the supertype can be only ONE of the subtypes Overlap Rule: An instance of the supertype could be more than one of the subtypes
a) Disjoint rule Figure 4-7 Examples of disjointness constraints A patient can either be outpatient or resident, but not both
b) Overlap rule Figure 4-7 Examples of disjointness constraints (cont.) A part may be both purchased and manufactured
Constraints in Supertype/ Subtype Discriminators Subtype Discriminator : An attribute of the supertype whose values determine the target subtype(s) Disjoint  – a  simple  attribute with alternative values to indicate the possible subtypes Overlapping  – a  composite  attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
Figure 4-8 Introducing a subtype discriminator ( disjoint  rule) A simple attribute with different possible values indicating the subtype
Figure 4-9 Subtype discriminator ( overlap  rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype
Figure 4-10 Example of supertype/subtype hierarchy
Entity Clusters EER diagrams are difficult to read when there are too many entities and relationships Solution: Group entities and relationships into  entity clusters Entity cluster : Set of one or more entity types and associated relationships grouped into a single abstract entity type
Figure 4-13a  Possible entity clusters for Pine Valley Furniture in Microsoft Visio Related groups of entities could become clusters
Figure 4-13b EER diagram of PVF entity clusters More readable, isn’t it?
Figure 4-14 Manufacturing entity cluster Detail for a single cluster
Packaged data models provide generic models that can be customized for a particular organization’s business rules
Business rules Statements that  define  or  constrain  some aspect of the business Classification of business rules: Derivation–rule derived from other knowledge, often in the form of a formula using attribute values Structural assertion–rule expressing static structure. Includes attributes, relationships, and definitions Action assertion–rule expressing constraints/control of organizational actions
Figure 4-18 EER diagram to describe business rules
Types of Action Assertions Result Condition–IF/THEN rule Integrity constraint–must always be true Authorization–privilege statement Form Enabler–leads to creation of new object Timer–allows or disallows an action Executive–executes one or more actions Rigor Controlling–something must or must not happen Influencing–guideline for which a notification must occur
Stating an Action Assertion Anchor Object–an object on which actions are limited Action–creation, deletion, update, or read Corresponding Objects–an object influencing the ability to perform an action on another business rule Action assertions identify corresponding objects that constrain the ability to perform actions on anchor objects
Figure 4-19 Data model segment for class scheduling
Figure 4-20  Business Rule 1: For a faculty member to be assigned to teach a section of a course, the faculty member must be qualified to teach the course for which that section is scheduled Action assertion Anchor object Corresponding object Corresponding object In this case, the action assertion is a  R estriction
Figure 4-21  Business Rule 2: For a faculty member to be assigned to teach a section of a course, the faculty member must not be assigned to teach a total of more than three course sections Action assertion Anchor object Corresponding object In this case, the action assertion is an U pper  LIM it
Chapter 5: Logical Database Design and the Relational Model Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms List five properties of relations State two properties of candidate keys Define first, second, and third normal form Describe problems from merging relations Transform E-R and EER diagrams to relations Create tables with entity and relational integrity constraints Use normalization to convert anomalous tables to well-structured relations
Relation Definition: A relation is a named, two-dimensional table of data  Table consists of rows (records) and columns (attribute or field) Requirements for a table to qualify as a relation: It must have a unique name Every attribute value must be atomic (not multivalued, not composite) Every row must be unique (can’t have two rows with exactly the same values for all their fields) Attributes (columns) in tables must have unique names The order of the columns must be irrelevant The order of the rows must be irrelevant NOTE: all  relations  are in  1 st  Normal form
Correspondence with E-R Model Relations (tables) correspond with entity types and with many-to-many relationship types Rows correspond with entity instances and with many-to-many relationship instances Columns correspond with attributes NOTE: The word  relation  (in relational database) is NOT the same as the word  relationship  (in E-R model)
Key Fields Keys are special fields that serve two main purposes: Primary keys  are  unique  identifiers of the relation in question. Examples include employee numbers, social security numbers, etc.  This is how we can guarantee that all rows are unique Foreign keys  are identifiers that enable a  dependent  relation (on the many side of a relationship) to refer to its  parent  relation (on the one side of the relationship) Keys can be  simple  (a single field) or  composite  (more than one field) Keys usually are used as indexes to speed up the response to user queries (More on this in Ch. 6)
Figure 5-3 Schema for four relations (Pine Valley Furniture Company) Primary Key Foreign Key  (implements 1:N relationship between customer and order) Combined, these are a  composite primary key  (uniquely identifies the order line)…individually they are  foreign keys  (implement M:N relationship between order and product)
Integrity Constraints Domain Constraints Allowable values for an attribute. See Table 5-1 Entity Integrity No primary key attribute may be null. All primary key fields  MUST  have data Action Assertions Business rules. Recall from Ch. 4
Domain definitions enforce domain integrity constraints
Integrity Constraints Referential Integrity–rule states that any foreign key value (on the relation of the many side) MUST match a primary key value in the relation of the one side. (Or the foreign key can be null)  For example: Delete Rules Restrict–don’t allow delete of  “parent” side if related rows exist in “dependent” side Cascade–automatically delete “dependent” side rows that correspond with the “parent” side row to be deleted Set-to-Null–set the foreign key in the dependent side to null if deleting from the parent side    not allowed for weak entities
Figure 5-5  Referential integrity constraints (Pine Valley Furniture) Referential integrity constraints are drawn via arrows from dependent to parent table
Figure 5-6 SQL table definitions Referential integrity constraints are implemented with foreign key to primary key references
Transforming EER Diagrams into Relations Mapping Regular Entities to Relations  Simple attributes: E-R attributes map directly onto the relation Composite attributes: Use only their simple, component attributes  Multivalued Attribute–Becomes a separate relation with a foreign key taken from the superior entity
(a) CUSTOMER entity type with simple attributes Figure 5-8 Mapping a regular entity (b) CUSTOMER relation
(a) CUSTOMER entity type with composite attribute Figure 5-9 Mapping a composite attribute (b) CUSTOMER relation with address detail
Figure 5-10 Mapping an entity with a multivalued attribute One–to–many relationship between original entity and new relation (a) Multivalued attribute becomes a separate relation with foreign key (b)
Transforming EER Diagrams into Relations (cont.) Mapping Weak Entities Becomes a separate relation with a foreign key taken from the superior entity Primary key composed of: Partial identifier of weak entity Primary key of identifying relation (strong entity)
Figure 5-11 Example of mapping a weak entity a) Weak entity DEPENDENT
NOTE: the domain constraint for the foreign key should NOT allow  null  value if DEPENDENT is a weak entity Foreign key Figure 5-11 Example of mapping a weak entity (cont.) b) Relations resulting from weak entity Composite primary key
Transforming EER Diagrams into Relations (cont.) Mapping Binary Relationships One-to-Many–Primary key on the one side becomes a foreign key on the many side Many-to-Many–Create a  new relation  with the primary keys of the two entities as its primary key One-to-One–Primary key on the mandatory side becomes a foreign key on the optional side
Figure 5-12 Example of mapping a 1:M relationship a) Relationship between customers and orders Note the mandatory one Again, no null value in the foreign key…this is because of the mandatory minimum cardinality Foreign key b) Mapping the relationship
Figure 5-13 Example of mapping an M:N relationship a) Completes relationship (M:N) The  Completes  relationship will need to become a separate relation
New  intersection relation Figure 5-13 Example of mapping an M:N relationship (cont.) b) Three resulting relations Foreign key Foreign key Composite primary key
Figure 5-14 Example of mapping a binary 1:1 relationship a) In_charge relationship (1:1) Often in 1:1 relationships, one direction is optional.
b) Resulting relations Figure 5-14 Example of mapping a binary 1:1 relationship (cont.) Foreign key goes in the relation on the optional side, Matching the primary key on the mandatory side
Transforming EER Diagrams into Relations (cont.) Mapping Associative Entities Identifier Not Assigned  Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship) Identifier Assigned  It is natural and familiar to end-users Default identifier may not be unique
Figure 5-15 Example of mapping an associative entity a) An associative entity
Figure 5-15 Example of mapping an associative entity (cont.) b) Three resulting relations Composite primary key formed from the two foreign keys
Figure 5-16 Example of mapping an associative entity with an identifier a) SHIPMENT associative entity
Figure 5-16 Example of mapping an associative entity with an identifier (cont.) b) Three resulting relations Primary key differs from foreign keys
Transforming EER Diagrams into Relations (cont.) Mapping Unary Relationships One-to-Many–Recursive foreign key in the same relation Many-to-Many–Two relations: One for the entity type One for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity
Figure 5-17 Mapping a unary 1:N relationship (a) EMPLOYEE entity with unary relationship (b) EMPLOYEE relation with recursive foreign key
Figure 5-18 Mapping a unary M:N relationship (a) Bill-of-materials relationships (M:N) (b) ITEM and COMPONENT relations
Transforming EER Diagrams into Relations (cont.) Mapping Ternary (and n-ary) Relationships One relation for each entity and one for the associative entity Associative entity has foreign keys to each entity in the relationship
Figure 5-19 Mapping a ternary relationship a) PATIENT TREATMENT Ternary relationship with associative entity
b) Mapping the ternary relationship PATIENT TREATMENT Remember that the primary key MUST be unique Figure 5-19 Mapping a ternary relationship (cont.) This is why treatment date and time are included in the composite primary key But this makes a very cumbersome key… It would be better to create a surrogate key like Treatment#
Transforming EER Diagrams into Relations (cont.) Mapping Supertype/Subtype Relationships One relation for supertype and for each subtype Supertype attributes (including identifier and subtype discriminator) go into supertype relation Subtype attributes go into each subtype; primary key of supertype relation also becomes primary key of subtype relation 1:1 relationship established between supertype and each subtype, with supertype as primary table
Figure 5-20 Supertype/subtype relationships
Figure 5-21  Mapping Supertype/subtype relationships to relations These are implemented as one-to-one relationships
Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that  avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller,  well-structured  relations
Well-Structured Relations A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies Insertion Anomaly –adding new rows forces user to create duplicate data Deletion Anomaly –deleting rows may cause a loss of data that would be needed for other future rows Modification Anomaly –changing data in a row forces changes to other rows because of duplication General rule of thumb: A table should not pertain to more than one entity type
Example–Figure 5-2b Question–Is this a relation?   Answer–Yes: Unique rows and no multivalued attributes Question–What’s the primary key?   Answer–Composite: Emp_ID, Course_Title
Anomalies in this Table Insertion –can’t enter a new employee without having the employee take a class Deletion –if we remove employee 140, we lose information about the existence of a Tax Acc class Modification –giving a salary increase to employee 100 forces us to update multiple records Why do these anomalies exist?  Because there are two themes (entity types) in this one relation. This results in data duplication and an unnecessary dependency between the entities
Functional Dependencies and Keys Functional Dependency: The value of one attribute (the  determinant ) determines the value of another attribute Candidate Key: A unique identifier. One of the candidate keys will become the primary key E.g. perhaps there is both credit card number and SS# in a table…in this case both are candidate keys Each non-key field is functionally dependent on every candidate key
Figure 5.22 Steps in normalization
First Normal Form No multivalued attributes Every attribute value is atomic Fig. 5-25  is not  in 1 st  Normal Form (multivalued attributes)    it is not a relation Fig. 5-26  is  in 1 st  Normal form All relations  are in 1 st  Normal Form
Table with multivalued attributes, not in 1 st  normal form Note: this is NOT a relation
Table with no multivalued attributes and unique rows, in 1 st  normal form Note: this is relation, but not a well-structured one
Anomalies in this Table Insertion –if new product is ordered for order 1007 of existing customer, customer data must be re-entered, causing duplication Deletion –if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price   Update –changing the price of product ID 4 requires update in several records Why do these anomalies exist?  Because there are multiple themes (entity types) in one relation. This results in duplication and an unnecessary dependency between the entities
Second Normal Form 1NF PLUS  every non-key attribute is fully functionally dependent on the ENTIRE primary key Every non-key attribute must be defined by the entire key, not by only part of the key No partial functional dependencies
Order_ID    Order_Date, Customer_ID, Customer_Name, Customer_Address Therefore, NOT in 2 nd  Normal Form Customer_ID    Customer_Name, Customer_Address Product_ID    Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID    Order_Quantity Figure 5-27 Functional dependency diagram for INVOICE
Partial dependencies are removed, but there are still transitive dependencies Getting it into Second Normal Form Figure 5-28 Removing partial dependencies
Third Normal Form 2NF PLUS  no transitive dependencies  (functional dependencies on non-primary-key attributes) Note: This is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third Solution: Non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table
Transitive dependencies are removed Figure 5-28 Removing partial dependencies Getting it into Third Normal Form
Merging Relations View Integration–Combining entities from multiple ER models into common relations Issues to watch out for when merging entities from different ER models: Synonyms–two or more attributes with different names but same meaning Homonyms–attributes with same name but different meanings Transitive dependencies–even if relations are in 3NF prior to merging, they may not be after merging Supertype/subtype relationships–may be hidden prior to merging
Enterprise Keys Primary keys that are unique in the whole database, not just within a single relation Corresponds with the concept of an object ID in object-oriented systems
Figure 5-31 Enterprise keys a) Relations with enterprise key b) Sample data with enterprise key
Chapter 6: Physical Database Design and Performance Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Describe the physical database design process Choose storage formats for attributes Select appropriate file organizations Describe three types of file organization Describe indexes and their appropriate use Translate a database model into efficient structures Know when and how to use denormalization
Physical Database Design Purpose–translate the logical description of data into the  technical specifications  for storing and retrieving data Goal–create a design for storing data that will provide  adequate performance  and insure  database integrity ,  security , and  recoverability
Physical Design Process Normalized relations Volume estimates Attribute definitions Response time expectations Data security needs Backup/recovery needs Integrity expectations DBMS technology used Inputs Attribute data types Physical record descriptions  (doesn’t always match  logical design) File organizations Indexes and database  architectures Query optimization Leads to Decisions
Figure 6-1 Composite usage map (Pine Valley Furniture Company)
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Data volumes
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Access Frequencies (per hour)
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis: 140 purchased parts accessed per hour   80 quotations accessed from these 140 purchased part accesses   70 suppliers accessed from these 80 quotation accesses
Figure 6-1 Composite usage map (Pine Valley Furniture Company) (cont.) Usage analysis: 75 suppliers accessed per hour   40 quotations accessed from these 75 supplier accesses   40 purchased parts accessed from these 40 quotation accesses
Designing Fields Field: smallest unit of data in database Field design  Choosing data type Coding, compression, encryption Controlling data integrity
Choosing Data Types CHAR–fixed-length character VARCHAR2–variable-length character (memo) LONG–large number NUMBER–positive/negative number INEGER–positive/negative whole number DATE–actual date BLOB–binary large object (good for graphics, sound clips, etc.)
Figure 6-2  Example code look-up table (Pine Valley Furniture Company) Code saves space, but costs an additional lookup to obtain actual value
Field Data Integrity Default value–assumed value if no explicit value Range control–allowable value limitations (constraints or validation rules) Null value control–allowing or prohibiting empty fields Referential integrity–range control (and null value allowances) for foreign-key to primary-key match-ups Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity
Handling Missing Data Substitute an estimate of the missing value (e.g., using a formula) Construct a report listing missing values In programs, ignore missing data unless the value is significant (sensitivity testing) Triggers can be used to perform these operations
Physical Records Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit Page: The amount of data read or written in one I/O operation Blocking Factor: The number of physical records per page
Denormalization Transforming  normalized  relations into  unnormalized  physical record specifications Benefits: Can improve performance (speed) by reducing number of table lookups (i.e.  reduce number of necessary join queries ) Costs (due to data duplication) Wasted storage space Data integrity/consistency threats Common denormalization opportunities One-to-one relationship (Fig. 6-3) Many-to-many relationship with attributes (Fig. 6-4) Reference data (1:N relationship where 1-side has data not used in any other relationship) (Fig. 6-5)
Figure 6-3  A possible denormalization situation: two entities with one-to-one relationship
Figure 6-4  A possible denormalization situation: a many-to-many relationship with nonkey attributes Extra table access required  Null description possible
Figure 6-5 A possible denormalization situation: reference data Extra table access required  Data duplication
Partitioning Horizontal Partitioning: Distributing the rows of a table into several separate files Useful for situations where different users need access to different rows Three types: Key Range Partitioning, Hash Partitioning, or Composite Partitioning Vertical Partitioning: Distributing the columns of a table into several separate relations Useful for situations where different users need access to different columns The primary key must be repeated in each file Combinations of Horizontal and Vertical Partitions often correspond with User Schemas (user views)
Partitioning (cont.) Advantages of Partitioning: Efficiency: Records used together are grouped together Local optimization: Each partition can be optimized for performance Security, recovery Load balancing: Partitions stored on different disks, reduces contention Take advantage of parallel processing capability Disadvantages of Partitioning: Inconsistent access speed: Slow retrievals across partitions Complexity: Non-transparent partitioning Extra space or update time: Duplicate data; access from multiple partitions
Data Replication Purposely storing the same data in multiple locations of the database Improves performance by allowing multiple users to access the same data at the same time with minimum contention Sacrifices data integrity due to data duplication Best for data that is not updated often
Designing Physical Files Physical File:  A named portion of secondary memory allocated for the purpose of storing physical records Tablespace–named set of disk storage elements in which physical files for database tables can be stored Extent–contiguous section of disk space Constructs to link two pieces of data: Sequential storage Pointers–field of data that can be used to locate related fields or records
Figure 6-4  Physical file terminology in an Oracle environment
File Organizations Technique for physically arranging records of a file on secondary storage Factors for selecting file organization: Fast data retrieval and throughput Efficient storage space utilization Protection from failure and data loss Minimizing need for reorganization Accommodating growth Security from unauthorized use Types of file organizations Sequential Indexed Hashed
Figure 6-7a  Sequential file organization If not sorted Average time to find desired record = n/2 1 2 n Records of the file are stored in sequence by the primary key field values If sorted –  every insert or delete requires resort
Indexed File Organizations Index–a separate table that contains organization of records for quick retrieval Primary keys are automatically indexed Oracle has a CREATE INDEX operation, and MS ACCESS allows indexes to be created for most field types Indexing approaches: B-tree index, Fig. 6-7b Bitmap index, Fig. 6-8 Hash Index, Fig. 6-7c Join Index, Fig 6-9
Figure 6-7b B-tree index uses a  tree search Average time to find desired record =  depth of the tree Leaves of the tree are all at same level   consistent access time
Figure 6-7c Hashed  file or index organization  Hash algorithm Usually uses division-remainder to determine record position. Records with same position are grouped in lists
Figure 6-8 Bitmap  index index organization  Bitmap saves on space requirements Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values
Figure 6-9  Join  Indexes–speeds up join operations
Clustering Files In some relational DBMSs, related records from different tables can be stored together in the same disk area Useful for improving performance of join operations Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table e.g. Oracle has a CREATE CLUSTER command
Rules for Using Indexes Use on larger tables Index the primary key of each table Index search fields (fields frequently in WHERE clause) Fields in SQL ORDER BY and GROUP BY commands When there are >100 values but not when there are <30 values
Rules for Using Indexes (cont.) Avoid use of indexes for fields with long values; perhaps compress values first DBMS may have limit on number of indexes per table and number of bytes per indexed field(s) Null values will not be referenced from an index Use indexes heavily for non-volatile databases; limit the use of indexes for volatile databases Why? Because modifications (e.g. inserts, deletes) require updates to occur in index files
RAID Redundant Array of Inexpensive Disks A set of disk drives that appear to the user to be a single disk drive Allows parallel access to data (improves access speed) Pages are arranged in  stripes
Figure 6-10 RAID with four disks and striping Here, pages 1-4 can be read/written simultaneously
Raid Types (Figure 6-10) Raid 0 Maximized parallelism No redundancy No error correction no fault-tolerance Raid 1 Redundant data – fault tolerant Most common form Raid 2 No redundancy One record spans across data disks Error correction in multiple disks– reconstruct damaged data Raid 3 Error correction in one disk Record spans multiple data disks (more than RAID2) Not good for multi-user environments,  Raid 4 Error correction in one disk Multiple records per stripe Parallelism, but slow updates due to error correction contention Raid 5 Rotating parity array Error correction takes place in same disks as data storage Parallelism, better performance than Raid4
Database Architectures  (Figure 6-11) Legacy Systems Current Technology Data Warehouses
Chapter 7: Introduction to SQL Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Interpret history and role of SQL  Define a database using SQL data definition language Write single table queries using SQL Establish referential integrity using SQL Discuss SQL:1999 and SQL:2003 standards
SQL Overview Structured Query Language The standard for relational database management systems (RDBMS)  RDBMS: A database management system that manages data as a collection of tables in which all relationships are represented by common values in related tables
History of SQL 1970–E. Codd develops relational database concept 1974-1979–System R with Sequel (later SQL) created at IBM Research Lab 1979–Oracle markets first relational DB with SQL 1986–ANSI SQL standard released 1989, 1992, 1999, 2003–Major ANSI standard updates Current–SQL is supported by most major database vendors
Purpose of SQL Standard Specify syntax/semantics for data definition and manipulation Define data structures Enable portability Specify minimal (level 1) and complete (level 2) standards Allow for later growth/enhancement to standard
Benefits of a Standardized Relational Language Reduced training costs Productivity Application portability Application longevity Reduced dependence on a single vendor Cross-system communication
SQL Environment Catalog   A set of schemas that constitute the description of a database Schema The structure that contains descriptions of objects created by a user (base tables, views, constraints) Data Definition Language (DDL) Commands that define a database, including creating, altering, and dropping tables and establishing constraints Data Manipulation Language (DML) Commands that maintain and query a database Data Control Language (DCL) Commands that control a database, including administering privileges and committing data
Figure 7-1 A simplified schematic of a typical SQL environment, as described by the SQL-2003 standard
Some SQL Data types
Figure 7-4  DDL, DML, DCL, and the database development process
SQL Database Definition Data Definition Language (DDL) Major CREATE statements: CREATE SCHEMA–defines a portion of the database owned by a particular user CREATE TABLE–defines a table and its columns CREATE VIEW–defines a logical table from one or more views Other CREATE statements: CHARACTER SET, COLLATION, TRANSLATION, ASSERTION, DOMAIN
Table Creation Figure 7-5 General syntax for CREATE TABLE Steps in table creation: Identify data types for attributes Identify columns that can and cannot be null Identify columns that must be unique (candidate keys) Identify primary key – foreign key mates Determine default values Identify constraints on columns (domain specifications) Create the table and associated indexes
The following slides create tables for this enterprise data model
Figure 7-6 SQL database definition commands for Pine Valley Furniture Overall table definitions
Defining attributes and their data types
Non-nullable specification Identifying primary key Primary keys can never have NULL values
Non-nullable specifications Primary key Some primary keys are composite– composed of multiple attributes
Default value Domain constraint Controlling the values in attributes
Primary key of  parent table Identifying foreign keys and establishing relationships Foreign key of  dependent table
Data Integrity Controls Referential integrity–constraint that ensures that foreign key values of a table must match primary key values of a related table in 1:M relationships Restricting: Deletes of primary records Updates of primary records Inserts of dependent records
Relational integrity is enforced via the primary-key to foreign-key match Figure 7-7 Ensuring data integrity through updates
Changing and Removing Tables ALTER TABLE statement allows you to change column specifications: ALTER TABLE CUSTOMER_T ADD (TYPE VARCHAR(2)) DROP TABLE statement allows you to remove tables from your schema: DROP TABLE CUSTOMER_T
Schema Definition Control processing/storage efficiency: Choice of indexes File organizations for base tables File organizations for indexes Data clustering Statistics maintenance Creating indexes Speed up random/sequential access to base table data Example CREATE INDEX NAME_IDX ON CUSTOMER_T(CUSTOMER_NAME) This makes an index for the CUSTOMER_NAME field of the CUSTOMER_T table
Insert Statement Adds data to a table Inserting into a table INSERT INTO CUSTOMER_T VALUES (001, ‘Contemporary Casuals’, ‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601); Inserting a record that has some null attributes requires identifying the fields that actually get data INSERT INTO PRODUCT_T (PRODUCT_ID, PRODUCT_DESCRIPTION,PRODUCT_FINISH, STANDARD_PRICE, PRODUCT_ON_HAND) VALUES (1, ‘End Table’, ‘Cherry’, 175, 8); Inserting from another table INSERT INTO CA_CUSTOMER_T SELECT * FROM CUSTOMER_T WHERE STATE = ‘CA’;
Creating Tables with Identity Columns Inserting into a table does not require explicit customer ID entry or field list INSERT INTO CUSTOMER_T VALUES ( ‘Contemporary Casuals’, ‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601); New with SQL:2003
Delete Statement Removes rows from a table Delete certain rows DELETE FROM CUSTOMER_T WHERE STATE = ‘HI’; Delete all rows DELETE FROM CUSTOMER_T;
Update Statement Modifies data in existing rows UPDATE PRODUCT_T SET UNIT_PRICE = 775 WHERE PRODUCT_ID = 7;
Merge Statement Makes it easier to update a table…allows combination of Insert and Update in one statement Useful for updating master tables with new data
SELECT Statement Used for queries on single or multiple tables Clauses of the SELECT statement: SELECT List the columns (and expressions) that should be returned from the query FROM Indicate the table(s) or view(s) from which data will be obtained WHERE Indicate the conditions under which a row will be included in the result GROUP BY Indicate categorization of results  HAVING Indicate the conditions under which a category (group) will be included ORDER BY Sorts the result according to specified criteria
Figure 7-10  SQL statement processing order  (adapted from van der Lans, p.100)
SELECT Example Find products with standard price less than $275 SELECT  PRODUCT_NAME, STANDARD_PRICE  FROM  PRODUCT_V  WHERE  STANDARD_PRICE < 275; Table 7-3: Comparison Operators in SQL
SELECT Example Using Alias Alias is an alternative column or table name SELECT  CUST .CUSTOMER AS  NAME ,  CUST.CUSTOMER_ADDRESS  FROM CUSTOMER_V  CUST WHERE  NAME  = ‘Home Furnishings’;
SELECT Example  Using a Function Using the COUNT  aggregate function  to find totals SELECT  COUNT(*)  FROM ORDER_LINE_V WHERE ORDER_ID = 1004; Note: with aggregate functions you can’t have single-valued columns included in the SELECT clause
SELECT Example–Boolean Operators AND ,  OR , and  NOT  Operators for customizing conditions in WHERE clause SELECT PRODUCT_DESCRIPTION, PRODUCT_FINISH, STANDARD_PRICE FROM PRODUCT_V WHERE (PRODUCT_DESCRIPTION  LIKE  ‘ % Desk’ OR  PRODUCT_DESCRIPTION  LIKE  ‘ % Table’)  AND  UNIT_PRICE > 300; Note: the LIKE operator allows you to compare strings using wildcards. For example, the % wildcard in ‘%Desk’  indicates that all strings that have any number of characters preceding the word “Desk” will be allowed
Venn Diagram from Previous Query
SELECT Example –  Sorting Results with the ORDER BY Clause Sort the results first by STATE, and within a state by CUSTOMER_NAME SELECT CUSTOMER_NAME, CITY, STATE FROM CUSTOMER_V WHERE STATE  IN  (‘FL’, ‘TX’, ‘CA’, ‘HI’) ORDER BY  STATE, CUSTOMER_NAME; Note: the IN operator in this example allows you to include rows whose STATE value is either FL, TX, CA, or HI. It is more efficient than separate OR conditions
SELECT Example–  Categorizing Results Using the GROUP BY Clause For use with aggregate functions Scalar aggregate : single value returned from SQL query with aggregate function Vector aggregate : multiple values returned from SQL query with aggregate function (via GROUP BY) SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE)  FROM CUSTOMER_V GROUP BY  CUSTOMER_STATE; Note: you can use single-value fields with aggregate functions if they are included in the GROUP BY clause
SELECT Example–  Qualifying Results by Categories  Using the HAVING Clause For use with GROUP BY SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE)  FROM CUSTOMER_V GROUP BY CUSTOMER_STATE HAVING  COUNT(CUSTOMER_STATE) > 1; Like a WHERE clause, but it operates on groups (categories), not on individual rows. Here, only those groups with total numbers greater than 1 will be included in final result
Using and Defining Views Views provide users controlled access to tables Base Table–table containing the raw data Dynamic View A “virtual table” created dynamically upon request by a user  No data actually stored; instead data from base table made available to user Based on SQL SELECT statement on base tables or other views Materialized View Copy or replication of data Data actually stored Must be refreshed periodically to match the corresponding base tables
Sample CREATE VIEW CREATE VIEW EXPENSIVE_STUFF_V AS SELECT PRODUCT_ID, PRODUCT_NAME, UNIT_PRICE FROM PRODUCT_T WHERE UNIT_PRICE >300 WITH CHECK_OPTION; View has a name View is based on a SELECT statement CHECK_OPTION works only for  updateable views and prevents updates  that would create rows not included in  the view
Advantages of Views Simplify query commands Assist with data security (but don't rely on views for security, there are more important security measures) Enhance programming productivity Contain most current base table data Use little storage space Provide customized view for user Establish physical data independence
Disadvantages of Views Use processing time each time view is referenced May or may not be directly updateable
Chapter 8: Advanced SQL Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Write multiple table SQL queries Define and use three types of joins Write correlated and noncorrelated subqueries Establish referential integrity in SQL Understand triggers and stored procedures Discuss SQL:1999 standard and its extension of SQL-92
Processing Multiple Tables–Joins Join – a relational operation that causes two or more tables with a common domain to be combined into a single table or view   Equi-join – a join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table Natural join – an equi-join in which one of the duplicate columns is eliminated in the result table Outer join – a join in which rows that do not have matching values in common columns are nonetheless included in the result table (as opposed to  inner  join, in which rows must have matching values in order to appear in the result table) Union join – includes all columns from each table in the join, and an instance for each row of each table The common columns in joined tables are usually the primary key  of the  dominant table and the foreign key of the dependent table in 1:M relationships
The following slides create tables for this enterprise data model
These tables are used in queries that follow Figure 8-1 Pine Valley Furniture Company Customer and Order tables with pointers from customers to their orders
For each customer who placed an order, what is the customer’s name and order number? SELECT CUSTOMER_T.CUSTOMER_ID,  CUSTOMER_NAME, ORDER_ID FROM CUSTOMER_T NATURAL JOIN ORDER_T ON  CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID; Natural Join Example Note: from Fig. 1, you see that only 10 Customers have links with orders.    Only 10 rows will be returned from this INNER join. Join involves multiple tables in FROM clause ON clause performs the equality check for common columns of the two tables
List the customer name, ID number, and order number for all customers. Include customer information even for customers that do have an order SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID FROM CUSTOMER_T, LEFT OUTER JOIN ORDER_T ON CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID; Outer Join Example  (Microsoft Syntax) Unlike INNER join, this will include customer rows with no matching order rows LEFT OUTER JOIN syntax with ON  causes customer data to appear even if there is no corresponding order data
Results Unlike INNER join, this will include customer rows with no matching order rows
Assemble all information necessary to create an invoice for order number 1006 SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, CUSTOMER_ADDRESS, CITY, SATE, POSTAL_CODE, ORDER_T.ORDER_ID, ORDER_DATE, QUANTITY, PRODUCT_DESCRIPTION, STANDARD_PRICE,  (QUANTITY * UNIT_PRICE) FROM CUSTOMER_T, ORDER_T, ORDER_LINE_T, PRODUCT_T WHERE  CUSTOMER_T.CUSTOMER_ID = ORDER_LINE.CUSTOMER_ID  AND ORDER_T.ORDER_ID = ORDER_LINE_T.ORDER_ID  AND ORDER_LINE_T.PRODUCT_ID = PRODUCT_PRODUCT_ID AND ORDER_T.ORDER_ID = 1006; Multiple Table Join Example Four tables involved in this join Each pair of tables requires an equality-check condition in the WHERE clause, matching primary keys against foreign keys
Figure 8-2 Results from a four-table join From CUSTOMER_T table From ORDER_T table From PRODUCT_T table
Processing Multiple Tables  Using Subqueries Subquery–placing an inner query (SELECT statement) inside an outer query Options: In a condition of the WHERE clause As a “table” of the FROM clause Within the HAVING clause Subqueries can be: Noncorrelated–executed once for the entire outer query Correlated–executed once for each row returned by the outer query
Show all customers who have placed an order SELECT CUSTOMER_NAME  FROM CUSTOMER_T WHERE CUSTOMER_ID  IN (SELECT DISTINCT CUSTOMER_ID FROM ORDER_T); Subquery Example Subquery is embedded in parentheses. In this case it returns a list that will be used in the WHERE clause of the outer query The IN operator will test to see if the CUSTOMER_ID value of a row is included in the list returned from the subquery
Correlated vs. Noncorrelated Subqueries Noncorrelated subqueries: Do not depend on data from the outer query Execute once for the entire outer query Correlated subqueries: Make use of data from the outer query Execute once for each row of the outer query Can use the EXISTS operator
Figure 8-3a Processing a noncorrelated subquery No reference to data in outer query, so subquery executes once only These are the only customers that have IDs in the ORDER_T table The subquery executes and returns the customer IDs from the ORDER_T table The outer query on the results of the subquery
Show all orders that include furniture finished in natural ash SELECT DISTINCT ORDER_ID FROM ORDER_LINE_T WHERE  EXISTS (SELECT * FROM PRODUCT_T  WHERE PRODUCT_ID = ORDER_LINE_T.PRODUCT_ID  AND PRODUCT_FINISH = ‘Natural ash’); Correlated Subquery Example The subquery is testing for a value that comes from the outer query  The EXISTS operator will return a TRUE value if the subquery resulted in a non-empty set, otherwise it returns a FALSE
Figure 8-3b Processing a correlated subquery Subquery refers to outer-query data, so executes once for each row of outer query Note: only the orders that involve products with Natural Ash will be included in the final results
Show all products whose standard price is higher than the average price SELECT PRODUCT_DESCRIPTION, STANDARD_PRICE,  AVGPRICE FROM (SELECT AVG(STANDARD_PRICE) AVGPRICE FROM PRODUCT_T), PRODUCT_T WHERE STANDARD_PRICE > AVG_PRICE; Another Subquery Example The WHERE clause normally cannot include aggregate functions, but because the aggregate is performed in the subquery its result can be used in the outer query’s WHERE clause  One column of the subquery is an aggregate function that has an alias name. That alias can then be referred to in the outer query Subquery forms the derived table used in the FROM clause of the outer query
Union Queries Combine the output (union of multiple queries) together into a single result table First query Second query Combine
Conditional Expressions Using Case Syntax This is available with newer versions of SQL, previously not part of the standard
Ensuring Transaction Integrity Transaction = A discrete unit of work that must be completely processed or not processed at all May involve multiple updates If any update fails, then all other updates must be cancelled SQL commands for transactions BEGIN TRANSACTION/END TRANSACTION Marks boundaries of a transaction COMMIT Makes all updates permanent ROLLBACK Cancels updates since the last COMMIT
Figure 8-5 An SQL Transaction sequence (in pseudocode)
Data Dictionary Facilities System tables that store metadata Users usually can view some of these tables Users are restricted from updating them Some examples in Oracle 10g DBA_TABLES–descriptions of tables DBA_CONSTRAINTS–description of constraints DBA_USERS–information about the users of the system Examples in Microsoft SQL Server 2000 SYSCOLUMNS–table and column definitions SYSDEPENDS–object dependencies based on foreign keys SYSPERMISSIONS–access permissions granted to users
SQL:1999 and SQL:2003 Enhancements/Extensions User-defined data types (UDT) Subclasses of standard types or an object type Analytical functions (for OLAP) CEILING, FLOOR, SQRT, RANK, DENSE_RANK WINDOW–improved numerical analysis capabilities New Data Types BIGINT, MULTISET (collection), XML CREATE TABLE LIKE–create a new table similar to an existing one MERGE
Persistent Stored Modules (SQL/PSM) Capability to create and drop code modules New statements: CASE, IF, LOOP, FOR, WHILE, etc. Makes SQL into a procedural language Oracle has propriety version called PL/SQL, and Microsoft SQL Server has Transact/SQL SQL:1999 and SQL:2003 Enhancements/Extensions (cont.)
Routines and Triggers Routines Program modules that execute on demand Functions –routines that return values and take input parameters Procedures –routines that do not return values and can take input or output parameters Triggers   Routines that execute in response to a database event (INSERT, UPDATE, or DELETE)
Figure 8-6 Triggers contrasted with stored procedures Procedures are called explicitly Triggers are event-driven Source : adapted from Mullins, 1995.
Figure 8-7 Simplified trigger syntax, SQL:2003 Figure 8-8 Create routine syntax, SQL:2003
Embedded and Dynamic SQL Embedded SQL Including hard-coded SQL statements in a program written in another language such as C or Java Dynamic SQL Ability for an application program to generate SQL code on the fly, as the application is running
Chapter 9:  The Client/Server Database Environment Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms List advantages of client/server architecture Explain three application components: presentation, processing, and storage Suggest partitioning possibilities Distinguish between file server, database server, 3-tier, and n-tier approaches Describe and discuss middleware Explain database linking via ODBC and JDBC
Client/Server Systems Networked computing model Processes distributed between clients and servers Client–Workstation (usually a PC) that requests and uses a service Server–Computer (PC/mini/mainframe) that provides a service For DBMS, server is a database server
Application Logic in C/S Systems GUI Interface Procedures, functions, programs DBMS activities Processing Logic I/O processing Business rules Data management Storage Logic Data storage/retrieval Presentation Logic Input–keyboard/mouse Output–monitor/printer
Client/Server Architectures File Server Architecture Database Server Architecture Three-tier Architecture Client does extensive processing Client does little processing
File Server Architecture All processing is done at the PC that requested the data  Entire files are transferred from the server to the client for processing Problems: Huge amount of data transfer on the network Each client must contain full DBMS  Heavy resource demand on clients Client DBMSs must recognize shared locks, integrity checks, etc. FAT CLIENT
Figure 9-2 File Server Architecture FAT CLIENT
Two-Tier Database Server Architectures Client is responsible for  I/O processing logic  Some business rules logic Server performs all data storage and access processing     DBMS is only on server
Advantages of Two-Tier Approach Clients do not have to be as powerful Greatly reduces data traffic on the network Improved data integrity since it is all processed centrally Stored procedures     DBMS code that performs some business rules done on server
Advantages of  Stored Procedures Compiled SQL statements Reduced network traffic Improved security Improved data integrity Thinner clients
Figure 9-3 Two-tier database server architecture Thinner clients DBMS only on server
Three-Tier Architectures Thin Client   PC just for user interface and a little application processing. Limited or no data storage (sometimes no hard drive) GUI interface  (I/O processing) Browser Business rules Web Server Data storage DBMS Client Application server Database server
Figure 9-4 Three-tier architecture Thinnest clients Business rules on separate server DBMS only on DB server
Advantages of Three-Tier Architectures Scalability Technological flexibility Long-term cost reduction Better match of systems to business needs Improved customer service Competitive advantage Reduced risk
Application Partitioning Placing portions of the application code in different locations (client vs. server) AFTER it is written Advantages Improved performance Improved interoperability Balanced workloads
Common Logic Distributions Figure 9-5a Two-tier client-server environment Figure 9-5b  n -tier client-server environment Processing logic could be at client, server, or both  Processing logic will be at application server or Web server
Role of the Mainframe Mission-critical legacy systems have tended to remain on mainframes  Distributed client/server systems tend to be used for smaller, workgroup systems Difficulties in moving mission critical systems from mainframe to distributed Determining which code belongs on server vs. client Identifying potential conflicts with code from other applications Ensuring sufficient resources exist for anticipated load Rule of thumb Mainframe for centralized data that does not need to be moved Client for data requiring frequent user access, complex graphics, and user interface
Middleware Software that allows an application to  interoperate  with other software No need for programmer/user to understand internal processing Accomplished via  Application Program Interface   (API) The  “glue”  that holds client/server applications together
Types of Middleware Remote Procedure Calls (RPC)  client makes calls to procedures running on remote computers synchronous and asynchronous Message-Oriented Middleware (MOM)  asynchronous calls between the client via message queues Publish/Subscribe push technology    server sends information to client when available Object Request Broker (ORB) object-oriented management of communications between clients and servers SQL-oriented Data Access middleware between applications and database servers
Database Middleware ODBC –Open Database Connectivity Most DB vendors support this OLE-DB Microsoft enhancement of ODBC JDBC –Java Database Connectivity Special Java classes that allow Java applications/applets to connect to databases
Client/Server Security Network environment    complex security issues Security levels: System-level password security for allowing access to the system Database-level password security for determining access privileges to tables; read/update/insert/delete privileges Secure client/server communication  via encryption
Keys to Successful Client-Server Implementation Accurate business problem analysis Detailed architecture analysis Architecture analysis  before  choosing tools Appropriate scalability Appropriate placement of services Network analysis Awareness of hidden costs Establish client/server security
Benefits of Moving to Client/Server Architecture Staged delivery of functionality speeds deployment GUI interfaces ease application use Flexibility and scalability facilitates business process reengineering Reduced network traffic due to increased processing at data source Facilitation of Web-enabled applications
Using ODBC to Link External Databases Stored on a Database Server Open Database Connectivity (ODBC) API provides a common language for application programs to access and process SQL databases independent of the particular RDBMS that is accessed Required parameters: ODBC driver  Back-end server name Database name User id and password Additional information: Data source name (DSN) Windows client computer name Client application program’s executable name Java Database Connectivity (JDBC) is similar to ODBC–built specifically for Java applications
ODBC Architecture  (Figure 9-6) Each DBMS has its own ODBC-compliant driver Client does not need to know anything about the DBMS Application Program Interface (API) provides common interface to all DBMSs
Chapter 10:  The Internet Database Environment Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Explain the importance of attaching a database to a Web page Describe necessary environment for Internet and Intranet database connectivity Use Internet terminology appropriately Explain the purpose of WWW Consortium Explain the purpose of server-side extensions Describe Web services Compare Web server interfaces (CGI, API, Java servlets) Decribe Web load balancing methods Explain plug-ins Explain the purpose of XML as a standard
Web Characterstics that Support Web-Based Database Applications Web browsers are simple to use Information transfer can take place across different platforms Development time and cost have been reduced Sites can be static (no database) or dynamic/interactive (with database) Potential e-business advantages (improved customer service, faster market time, better supply chain management)
Figure 10-1 Database-enabled intranet/internet environment
Internet and Intranet Services Web server Database-enabled services Directory, security, authentication E-mail File Transfer Protocol (FTP) Firewalls and proxy servers News or discussion groups Document search Load balancing and caching
World Wide Web Consortium (W3C) An international consortium of companies working to develop open standards that foster the development of Web conventions so that Web documents can be consistently displayed on all platforms  See  www.w3c.org
Web-Related Terms World Wide Web (WWW) The total set of interlinked hypertext documents residing on Web servers worldwide Browser Software that displays HTML documents and allows users to access files and software related to HTML documents Web Server Software that responds to requests from browsers and transmits HTML documents to browsers Web pages–HTML documents Static Web pages–content established at development time  Dynamic Web pages–content dynamically generated, usually by obtaining data from database
Communications Technology IP Address Four numbers that identify a node on the Internet e.g.  131.247.152.18 Hypertext Transfer Protocol (HTTP) Communication protocol used to transfer pages from Web server to browser HTTPS is a more secure version Uniform Resource Locator (URL) Mnemonic Web address corresponding with IP address Also includes folder location and html file name Typical URL
Internet-Related Languages Hypertext Markup Language (HTML) Markup language specifically for Web pages Standard Generalized Markup Language (SGML) Markup language standard Extensible Markup Language (XML) Markup language allowing customized tags XHTML XML-compliant extension of HTML Java Object-oriented programming language for applets JavaScript/VBScript   Scripting languages that enable interactivity in HTML documents Cascading Style Sheets (CSS) Control appearance of Web elements in an HML document XSL and XSLT XMS style sheet and transformation to HTML Standards and Web conventions established by World Wide Web Consortium (W3C)
XML Overview Becoming the standard for E-Commerce data exchange A markup language (like HTML) Uses elements, tags, attributes Includes document type declarations (DTDs), XML schemas, comments, and entity references XML Schema (XSD) replacing DTDs Relax NG–ISO standard XML database definition Document Structure Description (DSD)– expressive, easy to use XML database definition
Sample XML Schema Schema is a record definition, analogous to the Create SQL statement, and therefore provides metadata
Sample XML Document Data XML data involves elements and attributes defined in the schema, and is analogous to inserting a record into a database.
Server-Side Extensions Programs that interact directly with Web servers to handle requests e.g. database-request handling middleware Figure 10-2 Web-to-database middleware
Web Server Interfaces Common Gateway Interface (CGI) Specify transfer of information between Web server and CGI program Performance not very good Security risks Application Program Interface (API) More efficient than CGI Shared as dynamic link libraries (DLLs) Java Servlets Like applets, but stored at server Cross-platform compatible More efficient than CGI
Web Servers Provide HTTP service Passing plain text via TCP connection Serve many clients at once Therefore, multithreaded and multiprocessed Load balancing approaches: Domain Name Server (DNS) balancing One DNS = multiple IP addresses Software/hardware balancing Request at one IP address is distributed to multiple servers Reverse proxy Intercept client request and cache response
Client-Side Extensions Add functionality to the browser Plug-ins Hardware/software modules that extend browser capabilities by adding features (e.g. encryption, animation, wireless access) ActiveX Microsoft COM/OLE components that allow data manipulation inside the browser Cookies Block of data stored at client by Web server for later use
Components for Dynamic Web Sites DBMS–Oracle, Microsoft SQL Server, Informix, Sybase, DB2, Microsoft Access, MySQL Web server–Apache, Microsoft IIS Programming languages/development technologies–ASP .NET, PHP, ColdFusion, Coral Web Builder, Macromedia’s Dreamweaver Web browser–Microsoft Internet Explorer, Netscape Navigator, Mozilla Firefox, Apple’s Safari, Opera Text editor–Notepad, BBEdit, vi, or an IDE FTP capabilities–SmartFTP, WS_FTP
Figure 10-3 Dynamic Web development environment
Figure 10-4 Sample PHP script that accepts user registration input a) PHP script initiation and input validation (Ullman, PHP and MySql for Dynamic Web Sites, 2003, Script 6.6)
Figure 10-4a (cont.)
Figure 10-4 Sample PHP script that accepts user registration input b) Adding user information to the database
Figure 10-4 Sample PHP script that accepts user registration input c) Close PHP script and display HTML form
Web Services XML-based standards that define protocols for automatic communication between applications over the Web.  Web Service Components: Universal Description, Discovery, and Integration (UDDI) Technical specification for distributed registries of Web services and businesses open to communication on these services Web Services Description Language (WSDL) XML-based grammar for describing Web services and providing public interfaces for these services Simple Object Access Protocol (SOAP) XML-based communication protocol for sending messages between applications via the Internet Challenges for Web Services Lack of mature standards Lack of security
Figure 10-5 A typical order entry system that uses Web services (adapted from Newcomer 2002, Figure 1-3) Figure 10-6 Web services protocol stack
Figure 10-7 Web services deployment  (adapted from Newcomer, 2002)
Service Oriented Architectures Collection of services that communicate with each other by passing data Web services, CORBA, Java, XML, SOAP, WSDL Loosely coupled Interoperable Using SOA results in increased software development efficiency (up to 40%)
Semantic Web W3C project using Web metadata to automate collection of knowledge and storing in easily understood format Structuring based on: XML Resource Description Framewok (RDF) Web Ontology Language (OWL)
Rapidly Accelerating Internet Changes Integrated database environments Use of cell phones and PDAs Changes in organizational relationships Globalization Challenges to IT personnel require: Business and technology infrastructure understanding Leadership and communication skills Upward influence techniques Employee management techniques
Chapter 11:  Data Warehousing Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms Reasons for information gap between information needs and availability Reasons for need of data warehousing Describe three levels of data warehouse architectures List four steps of data reconciliation Describe two components of star schema Estimate fact table size Design a data mart
Definition Data Warehouse :  A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented:  e.g. customers, patients, students, products Integrated:  Consistent naming conventions, formats, encoding structures; from multiple data sources Time-variant:  Can study trends and changes Nonupdatable:  Read-only, periodically refreshed Data Mart : A data warehouse that is limited in scope
Need for Data Warehousing Integrated, company-wide view of high-quality information (from disparate databases) Separation of  operational  and  informational  systems and data (for improved performance)
Source : adapted from Strange (1997).
Data Warehouse Architectures Generic Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture All involve some form of  extraction ,  transformation  and  loading  ( ETL )
Figure 11-2: Generic two-level data warehousing architecture E T L One, company-wide warehouse Periodic extraction    data is not completely current in warehouse
Figure 11-3 Independent data mart data warehousing architecture Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each  independent  data mart Data access complexity due to  multiple  data marts
Figure 11-4 Dependent data mart with operational data store:   a three-level architecture E T L Single ETL for  enterprise data warehouse (EDW) Simpler data access ODS  provides option for obtaining  current  data Dependent  data marts loaded from EDW
Figure 11-5 Logical data mart and real time warehouse architecture E T L Near real-time ETL for  Data Warehouse ODS  and  data warehouse  are one and the same Data marts are NOT separate databases, but logical  views  of the data warehouse    Easier to create new data marts
Figure 11-6 Three-layer data architecture for a data warehouse
Data Characteristics Status vs. Event Data Event =  a database action (create/update/delete) that results from a transaction Figure 11-7 Example of DBMS log entry Status Status
Data Characteristics Transient vs. Periodic Data With transient data, changes to existing records are written over previous records, thus destroying the previous data content Figure 11-8 Transient operational data
Periodic data are never physically altered or deleted once they have been added to the store Data Characteristics Transient vs. Periodic Data Figure 11-9: Periodic warehouse data
Other Data Warehouse Changes New descriptive attributes New business activity attributes New classes of descriptive attributes Descriptive attributes become more refined Descriptive data are related to one another New source of data
The Reconciled Data Layer Typical operational data is: Transient–not historical Not normalized (perhaps due to denormalization for performance) Restricted in scope–not comprehensive Sometimes poor quality–inconsistencies and errors After ETL, data should be: Detailed–not summarized yet Historical–periodic Normalized–3 rd  normal form or higher Comprehensive–enterprise-wide perspective Timely–data should be current enough to assist decision-making Quality controlled–accurate with full integrity
The ETL Process Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load
Static extract  = capturing a snapshot of the source data at a point in time Incremental extract  = capturing changes that have occurred since the last static extract Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 11-10: Steps in data reconciliation
Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors:  misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also:  decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Figure 11-10: Steps in data reconciliation (cont.)
Transform = convert data from format of operational system to format of data warehouse Record-level: Selection –data partitioning Joining –data combining Aggregation –data summarization Field-level:   single-field –from one field to one field multi-field –from many fields to one, or one field to many Figure 11-10: Steps in data reconciliation (cont.)
Load/Index= place transformed data into the warehouse and create indexes Refresh mode:  bulk rewriting of target data at periodic intervals Update mode:  only changes in source data are written to data warehouse Figure 11-10: Steps in data reconciliation (cont.)
Figure 11-11: Single-field transformation In general–some transformation function translates data from old form to new form Algorithmic  transformation uses a formula or logical expression Table   lookup –another approach, uses a separate table keyed by source record code
Figure 11-12: Multifield transformation M:1–from many source fields to one target field 1:M–from one source field to many target fields
Derived Data Objectives Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities Characteristics Detailed (mostly periodic) data Aggregate (for summary) Distributed (to departmental servers) Most common data model =  star schema (also called “dimensional model”)
Figure 11-13 Components of a  star schema Fact tables  contain factual or quantitative data Dimension tables  contain descriptions about the subjects of the business  1:N relationship between dimension tables and fact tables  Excellent for ad-hoc queries, but bad for online transaction processing Dimension tables are denormalized to maximize performance
Figure 11-14 Star schema example Fact table  provides statistics for sales broken down by product, period and store dimensions
Figure 11-15 Star schema with sample data
Issues Regarding Star Schema Dimension table keys must be  surrogate  (non-intelligent and non-business related), because: Keys may change over time Length/format consistency Granularity of Fact Table–what level of detail do you want?  Transactional grain–finest level Aggregated grain–more summarized Finer grains    better  market basket analysis  capability Finer grain    more dimension tables, more rows in fact table Duration of the database–how much history should be kept? Natural duration–13 months or 5 quarters Financial institutions may need longer duration Older data is more difficult to source and cleanse
Figure 11-16: Modeling dates Fact tables contain time-period data    Date dimensions are important
The User Interface Metadata (data catalog) Identify subjects of the data mart Identify dimensions and facts Indicate how data is derived from enterprise data warehouses, including derivation rules Indicate how data is derived from operational data store, including derivation rules Identify available reports and predefined queries Identify data analysis techniques (e.g. drill-down) Identify responsible people
On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube  structure OLAP Operations Cube slicing –come up with 2-D view of data Drill-down –going from summary to more detailed views
Figure 11-23 Slicing a data cube
Figure 11-24  Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells
Data Mining and Visualization Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis
Chapter 12:  Data and Database Administration Modern Database Management 8 th  Edition Jeffrey A. Hoffer, Mary B. Prescott,  Fred R. McFadden © 2007 by Prentice Hall
Objectives Definition of terms List functions and roles of data/database administration Describe role of data dictionaries and information repositories Compare optimistic and pessimistic concurrency control Describe problems and techniques for data security Describe problems and techniques for data recovery Describe database tuning issues and list areas where changes can be done to tune the database Describe importance and measures of data quality Describe importance and measures of data availability
Traditional Administration Definitions Data Administration :  A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards Database Administration :  A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery
Traditional Data Administration Functions Data policies, procedures, standards Planning Data conflict (ownership) resolution Managing the information repository Internal marketing of DA concepts
Traditional Database Administration Functions Selection of DBMS and software tools Installing/upgrading DBMS Tuning database performance Improving query processing performance Managing data security, privacy, and integrity Data backup and recovery
Evolving Approaches to Data Administration Blend data and database administration into one role Fast-track development – monitoring development process (analysis, design, implementation, maintenance) Procedural DBAs–managing quality of triggers and stored procedures eDBA–managing Internet-enabled database applications PDA DBA–data synchronization and personal database management Data warehouse administration
Data Warehouse Administration New role, coming with the growth in data warehouses Similar to DA/DBA roles Emphasis on integration and coordination of metadata/data across many data sources Specific roles: Support DSS applications Manage data warehouse growth Establish service level agreements regarding data warehouses and data marts
Open Source DBMSs An alternative to proprietary packages such as Oracle, Microsoft SQL Server, or Microsoft Access mySQL is an example of open-source DBMS Less expensive than proprietary packages Source code available, for modification
Figure 12-2 Data modeling responsibilities
Database Security Database Security:  Protection of the data against accidental or intentional loss, destruction, or misuse Increased difficulty due to Internet access and client/server technologies
Figure 12-3 Possible locations of data security threats
Threats to Data Security Accidental losses attributable to: Human error Software failure Hardware failure Theft and fraud Improper data access: Loss of privacy (personal data) Loss of confidentiality (corporate data) Loss of data integrity Loss of availability (through, e.g. sabotage)
Figure 12-4 Establishing Internet Security
Web Security Static HTML files are easy to secure Standard database access controls Place Web files in protected directories on server Dynamic pages are harder Control of CGI scripts User authentication Session security SSL for encryption Restrict number of users and open ports Remove unnecessary programs
W3C Web Privacy Standard Platform for Privacy Protection (P3P)  Addresses the following: Who collects data What data is collected and for what purpose Who is data shared with Can users control access to their data How are disputes resolved Policies for retaining data Where are policies kept and how can they be accessed
Database Software Security Features Views or subschemas Integrity controls Authorization rules User-defined procedures Encryption Authentication schemes Backup, journalizing, and checkpointing
Views and Integrity Controls Views Subset of the database that is presented to one or more users User can be given access privilege to view without allowing access privilege to underlying tables Integrity Controls Protect data from unauthorized use Domains–set allowable values Assertions–enforce database conditions
Authorization Rules Controls incorporated in the data management system  Restrict:  access to data actions that people can take on data  Authorization matrix for: Subjects Objects Actions Constraints
Figure 12-5 Authorization matrix
Some DBMSs also provide capabilities for  user-defined procedures  to customize the authorization process Figure 12-6a Authorization table for subjects (salespeople) Figure 12-6b Authorization table for objects (orders) Figure 12-7 Oracle privileges Implementing authorization rules
Encryption  – the coding or scrambling of data so that humans cannot read them Secure Sockets Layer (SSL) is a popular encryption scheme for TCP/IP connections Figure 12-8 Basic two-key encryption
Authentication Schemes Goal – obtain a  positive  identification of the user Passwords: First line of defense Should be at least 8 characters long Should combine alphabetic and numeric data Should not be complete words or personal information Should be changed frequently
Authentication Schemes (cont.) Strong Authentication Passwords are flawed: Users share them with each other They get written down, could be copied Automatic logon scripts remove need to explicitly type them in Unencrypted passwords travel the Internet Possible solutions: Two factor–e.g. smart card plus PIN Three factor–e.g. smart card, biometric, PIN Biometric devices–use of fingerprints, retinal scans, etc. for positive ID Third-party mediated authentication–using secret keys, digital certificates
Security Policies and Procedures Personnel controls Hiring practices, employee monitoring, security training Physical access controls Equipment locking, check-out procedures, screen placement Maintenance controls Maintenance agreements, access to source code, quality and availability standards Data privacy controls Adherence to privacy legislation, access rules
Database Recovery Mechanism for restoring a database quickly and accurately after loss or damage Recovery facilities: Backup Facilities Journalizing Facilities Checkpoint Facility Recovery Manager
Back-up Facilities Automatic dump facility that produces backup copy of the entire database Periodic backup (e.g. nightly, weekly) Cold backup–database is shut down during backup Hot backup–selected portion is shut down and backed up at a given time Backups stored in secure, off-site location
Journalizing Facilities Audit trail of transactions and database updates Transaction log–record of essential data for each transaction processed against the database Database change log–images of updated data Before-image–copy before modification After-image–copy after modification Produces an  audit trail
Figure 12-9 Database audit trail From the backup and logs, databases can be restored in case of damage or loss
Checkpoint Facilities DBMS periodically refuses to accept new transactions    system is in a  quiet  state Database and transaction logs are synchronized This allows recovery manager to resume processing from short period, instead of repeating entire day
Recovery and Restart Procedures Disk Mirroring–switch between identical copies of databases Restore/Rerun–reprocess transactions against the backup Transaction Integrity–commit or abort all transaction changes Backward Recovery (Rollback)–apply before images Forward Recovery (Roll Forward)–apply after images (preferable to restore/rerun)
Transaction ACID Properties Atomic Transaction cannot be subdivided Consistent Constraints don’t change from before transaction to after transaction Isolated Database changes not revealed to users until after transaction has completed Durable Database changes are permanent
Figure 12-10 Basic recovery techniques a) Rollback
Figure 12-10 Basic recovery techniques (cont.) b) Rollforward
Database Failure Responses Aborted transactions Preferred recovery: rollback Alternative: Rollforward to state just prior to abort Incorrect data Preferred recovery: rollback Alternative 1: rerun transactions not including inaccurate data updates Alternative 2: compensating transactions System failure (database intact) Preferred recovery: switch to duplicate database Alternative 1: rollback Alternative 2: restart from checkpoint Database destruction Preferred recovery: switch to duplicate database Alternative 1: rollforward Alternative 2: reprocess transactions
Concurrency Control Problem –in a multiuser environment, simultaneous access to data can result in interference and data loss Solution – Concurrency Control The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment
Figure 12-11  Lost update (no concurrency control in effect) Simultaneous access causes updates to cancel each other A similar problem is the  inconsistent read  problem
Concurrency Control Techniques Serializability Finish one transaction before starting another Locking Mechanisms  The most  common way of achieving serialization Data that is retrieved for the purpose of updating is locked for the updater No other user can perform update until unlocked
Figure 12-12: Updates with locking (concurrency control) This prevents the lost update problem
Locking Mechanisms Locking level: Database–used during database updates Table–used for bulk updates Block or page–very commonly used Record–only requested row; fairly commonly used Field–requires significant overhead; impractical Types of locks: Shared lock–Read but no update permitted.  Used when just reading to prevent another user from placing an exclusive lock on the record Exclusive lock–No access permitted.  Used when preparing to update
Deadlock An impasse that results when two or more transactions have locked common resources, and each waits for the other to unlock their resources Figure 12-13 The problem of deadlock John and Marsha will wait forever for each other to release their locked resources!
Managing Deadlock Deadlock prevention: Lock all records required at the beginning of a transaction Two-phase locking protocol Growing phase Shrinking phase May be difficult to determine all needed resources in advance Deadlock Resolution: Allow deadlocks to occur Mechanisms for detecting and breaking them Resource usage matrix
Versioning Optimistic approach to concurrency control Instead of locking Assumption is that simultaneous updates will be infrequent Each transaction can attempt an update as it wishes The system will reject an update when it senses a conflict Use of rollback and commit for this
Figure 12-15 The use of versioning Better performance than locking
Managing Data Quality Causes of poor data quality External data sources Redundant data storage Lack of organizational commitment Data quality improvement Perform data quality audit Establish data stewardship program (data steward   is a liaison between IT and business units) Apply total quality management (TQM) practices Overcome organizational barriers Apply modern DBMS technology Estimate return on investment
Data Dictionaries and Repositories Data dictionary Documents data elements of a database System catalog System-created database that describes all database objects Information Repository Stores metadata describing data and data processing resources Information Repository Dictionary System (IRDS) Software tool managing/controlling access to information repository
Figure 12-16 Three components of the repository system architecture A schema of the repository information Software that manages the repository objects Where repository objects are stored Source : adapted from Bernstein, 1996.
Database Performance Tuning DBMS Installation Setting installation parameters Memory Usage  Set cache levels Choose background processes Input/Output (I/O) Contention Use striping Distribution of heavily accessed files CPU Usage Monitor CPU load Application tuning Modification of SQL code in applications
Data Availability Downtime is expensive How to ensure availability Hardware failures–provide redundancy for fault tolerance Loss of data–database mirroring Maintenance downtime–automated and nondisruptive maintenance utilities Network problems–careful traffic monitoring, firewalls, and routers

Modern database management jeffrey a. hoffer, mary b. prescott,

  • 1.
    Grading System LectureGrade 1 st Exam - 10% Ch 1 – 2 2 nd Exam - 10% Ch 3 – 5 3 rd Exam - 10% Ch 7 – 8 (SQL) 4 th Exam - 15% Overall Project - 15% Q/A/Etc - 40% TOTAL - 100% * .75
  • 2.
    Laboratory Grade LaboratoryExercises - 10% Hands – on Exam - 15 % TOTAL - 25% GRADE = LEC + LAB = 75% + 25% = 100%
  • 3.
    Chapter 1: TheDatabase Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 4.
    Objectives Definition ofterms Explain growth and importance of databases Name limitations of conventional file processing Identify five categories of databases Explain advantages of databases Identify costs and risks of databases List components of database environment Describe evolution of database systems
  • 5.
    Definitions Database: organizedcollection of logically related data Data: stored representations of meaningful objects and events Structured: numbers, text, dates Unstructured: images, video, documents Information: data processed to increase knowledge in the person using the data Metadata: data that describes the properties and context of user data
  • 6.
    Figure 1-1a Datain context Context helps users understand data
  • 7.
    Graphical displays turndata into useful information that managers can use for decision making and interpretation Figure 1-1b Summarized data
  • 8.
    Descriptions of theproperties or characteristics of the data, including data types, field sizes, allowable values, and data context
  • 9.
    Disadvantages of FileProcessing Program-Data Dependence All programs maintain metadata for each file they use Duplication of Data Different systems/programs have separate copies of the same data Limited Data Sharing No centralized control of data Lengthy Development Times Programmers must design their own file formats Excessive Program Maintenance 80% of information systems budget
  • 10.
    Problems with DataDependency Each application programmer must maintain his/her own data Each application program needs to include code for the metadata of each file Each application program must have its own processing routines for reading, inserting, updating, and deleting data Lack of coordination and central control Non-standard file formats
  • 11.
    Figure 1-3 Oldfile processing systems at Pine Valley Furniture Company Duplicate Data
  • 12.
    Problems with DataRedundancy Waste of space to have duplicate data Causes more maintenance headaches The biggest problem: Data changes in one file could cause inconsistencies Compromises in data integrity
  • 13.
    SOLUTION: TheDATABASE Approach Central repository of shared data Data is managed by a controlling agent Stored in a standardized, convenient form Requires a Database Management System (DBMS)
  • 14.
    Database Management SystemDBMS manages data resources like an operating system manages hardware resources A software system that is used to create, maintain, and provide controlled access to user databases Order Filing System Invoicing System Payroll System DBMS Central database Contains employee, order, inventory, pricing, and customer data
  • 15.
    Advantages of theDatabase Approach Program-data independence Planned data redundancy Improved data consistency Improved data sharing Increased application development productivity Enforcement of standards Improved data quality Improved data accessibility and responsiveness Reduced program maintenance Improved decision support
  • 16.
    Costs and Risksof the Database Approach New, specialized personnel Installation and management cost and complexity Conversion costs Need for explicit backup and recovery Organizational conflict
  • 17.
    Elements of theDatabase Approach Data models Graphical system capturing nature and relationship of data Enterprise Data Model–high-level entities and relationships for the organization Project Data Model–more detailed view, matching data structure in database or data warehouse Relational Databases Database technology involving tables (relations) representing entities and primary/foreign keys representing relationships Use of Internet Technology Networks and telecommunications, distributed databases, client-server, and 3-tier architectures Database Applications Application programs used to perform database activities (create, read, update, and delete) for database users
  • 18.
    Segment of anEnterprise Data Model Segment of a Project-Level Data Model
  • 19.
    One customer mayplace many orders, but each order is placed by a single customer  One-to-many relationship
  • 20.
    One order hasmany order lines; each order line is associated with a single order  One-to-many relationship
  • 21.
    One product canbe in many order lines, each order line refers to a single product  One-to-many relationship
  • 22.
    Therefore, one orderinvolves many products and one product is involved in many orders  Many-to-many relationship
  • 23.
    Figure 1-4 Enterprisedata model for Figure 1-3 segments
  • 24.
    Figure 1-5 Componentsof the Database Environment
  • 25.
    Components of the Database Environment CASE Tools – computer-aided software engineering Repository – centralized storehouse of metadata Database Management System (DBMS) – software for managing the database Database – storehouse of the data Application Programs – software using the data User Interface – text and graphical displays to users Data/Database Administrators – personnel responsible for maintaining the database System Developers – personnel responsible for designing databases and software End Users – people who use the applications and databases
  • 26.
    The Range ofDatabase Applications Personal databases Workgroup databases Departmental/divisional databases Enterprise database
  • 28.
    Figure 1-6 Typicaldata from a personal database
  • 29.
    Figure 1-7 Workgroupdatabase with wireless local area network
  • 30.
    Enterprise Database ApplicationsEnterprise Resource Planning (ERP) Integrate all enterprise functions (manufacturing, finance, sales, marketing, inventory, accounting, human resources) Data Warehouse Integrated decision support system derived from various operational databases
  • 31.
    Figure 1-8 Anenterprise data warehouse
  • 32.
  • 33.
    Chapter 2: The Database Development Process Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 34.
    Objectives Definition ofterms Describe system development life cycle Explain prototyping approach Explain roles of individuals Explain three-schema approach Explain role of packaged data models Explain three-tiered architectures Explain scope of database design projects Draw simple data models
  • 35.
    Enterprise Data ModelFirst step in database development Specifies scope and general content Overall picture of organizational data at high level of abstraction Entity-relationship diagram Descriptions of entity types Relationships between entities Business rules
  • 36.
    Figure 2-1 Segmentfrom enterprise data model Enterprise data model describes the high-level entities in an organization and the relationship between these entities
  • 37.
    Information Systems Architecture(ISA) Conceptual blueprint for organization’s desired information systems structure Consists of: Data (e.g. Enterprise Data Model – simplified ER Diagram) Processes – data flow diagrams, process decomposition, etc. Data Network – topology diagram (like Fig 1-9) People – people management using project management tools (Gantt charts, etc.) Events and points in time (when processes are performed) Reasons for events and rules (e.g., decision tables)
  • 38.
    Information Engineering Adata-oriented methodology to create and maintain information systems Top-down planning–a generic IS planning methodology for obtaining a broad understanding of the IS needed by the entire organization Four steps to Top-Down planning: Planning Analysis Design Implementation
  • 39.
    Information Systems Planning (Table 2-1) Purpose – align information technology with organization’s business strategies Three steps: Identify strategic planning factors Identify corporate planning objects Develop enterprise model
  • 40.
    Identify Strategic PlanningFactors (Table 2-2) Organization goals–what we hope to accomplish Critical success factors–what MUST work in order for us to survive Problem areas–weaknesses we now have
  • 41.
    Identify Corporate PlanningObjects (Table 2-3) Organizational units–departments Organizational locations Business functions–groups of business processes Entity types–the things we are trying to model for the database Information systems–application programs
  • 42.
    Develop Enterprise ModelFunctional decomposition Iterative process breaking system description into finer and finer detail Enterprise data model Planning matrixes Describe interrelationships between planning objects
  • 43.
    Figure 2-2 Exampleof process decomposition of an order fulfillment function (Pine Valley Furniture) Decomposition = breaking large tasks into smaller tasks in a hierarchical structure chart
  • 44.
    Planning Matrixes Describerelationships between planning objects in the organization Types of matrixes: Function-to-data entity Location-to-function Unit-to-function IS-to-data entity Supporting function-to-data entity IS-to-business objective
  • 45.
    Example business function-to-dataentity matrix (Fig. 2-3)
  • 46.
    Two Approaches toDatabase and IS Development SDLC System Development Life Cycle Detailed, well-planned development process Time-consuming, but comprehensive Long development cycle Prototyping Rapid application development (RAD) Cursory attempt at conceptual data modeling Define database during development of initial prototype Repeat implementation and maintenance activities with new prototype versions
  • 47.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 48.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) (cont.) Planning Purpose – preliminary understanding Deliverable – request for study Database activity – enterprise modeling and early conceptual data modeling Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 49.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) (cont.) Analysis Purpose–thorough requirements analysis and structuring Deliverable–functional system specifications Database activity–Thorough and integrated conceptual data modeling Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 50.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) (cont.) Logical Design Purpose–information requirements elicitation and structure Deliverable–detailed design specifications Database activity– logical database design (transactions, forms, displays, views, data integrity and security) Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 51.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) (cont.) Physical Design Purpose–develop technology and organizational specifications Deliverable–program/data structures, technology purchases, organization redesigns Database activity– physical database design (define database to DBMS, physical data organization, database processing programs) Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 52.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) (cont.) Implementation Purpose–programming, testing, training, installation, documenting Deliverable–operational programs, documentation, training materials Database activity– database implementation, including coded programs, documentation, installation and conversion Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 53.
    Systems Development LifeCycle (see also Figures 2.4, 2.5) (cont.) Maintenance Purpose–monitor, repair, enhance Deliverable–periodic audits Database activity– database maintenance, performance analysis and tuning, error corrections Planning Analysis Physical Design Implementation Maintenance Logical Design
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
    CASE Computer-Aided SoftwareEngineering (CASE)–software tools providing automated support for systems development Three database features: Data modeling–drawing entity-relationship diagrams Code generation–SQL code for table creation Repositories–knowledge base of enterprise information
  • 60.
    Packaged Data ModelsModel components that can be purchased, customized, and assembled into full-scale data models Advantages Reduced development time Higher model quality and reliability Two types: Universal data models Industry-specific data models
  • 61.
    Managing Projects Project–aplanned undertaking of related activities to reach an objective that has a beginning and an end Involves use of review points for: Validation of satisfactory progress Step back from detail to overall view Renew commitment of stakeholders Incremental commitment–review of systems development project after each development phase with rejustification after each phase
  • 62.
    Managing Projects: PeopleInvolved Business analysts Systems analysts Database analysts and data modelers Users Programmers Database architects Data administrators Project managers Other technical experts
  • 63.
    Database Schema PhysicalSchema Physical structures–covered in Chapters 5 and 6 Conceptual Schema E-R models–covered in Chapters 3 and 4 External Schema User Views Subsets of Conceptual Schema Can be determined from business-function/data entity matrices DBA determines schema for different users
  • 64.
    Different people havedifferent views of the database…these are the external schema The internal schema is the underlying design and implementation Figure 2-7 Three-schema architecture
  • 65.
    Figure 2-8 Developingthe three-tiered architecture
  • 66.
    Figure 2-9 Three-tieredclient/server database architecture
  • 67.
    Pine Valley FurnitureSegment of project data model (Figure 2-11)
  • 68.
    Figure 2-12 Fourrelations (Pine Valley Furniture)
  • 69.
    Figure 2-12 Fourrelations (Pine Valley Furniture) (cont.)
  • 70.
    Chapter 3: ModelingData in the Organization Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 71.
    Objectives Definition ofterms Importance of data modeling Write good names and definitions for entities, relationships, and attributes Distinguish unary, binary, and ternary relationships Model different types of attributes, entities, relationships, and cardinalities Draw E-R diagrams for common business situations Convert many-to-many relationships to associative entities Model time-dependent data using time stamps
  • 72.
    Business Rules Statementsthat define or constrain some aspect of the business Assert business structure Control/influence business behavior Expressed in terms familiar to end users Automated through DBMS software
  • 73.
    A Good BusinessRule is: Declarative–what, not how Precise–clear, agreed-upon meaning Atomic–one statement Consistent–internally and externally Expressible–structured, natural language Distinct–non-redundant Business-oriented–understood by business people
  • 74.
    A Good DataName is: Related to business, not technical, characteristics Meaningful and self-documenting Unique Readable Composed of words from an approved list Repeatable
  • 75.
    Data Definitions Explanationof a term or fact Term–word or phrase with specific meaning Fact–association between two or more terms Guidelines for good data definition Gathered in conjunction with systems requirements Accompanied by diagrams Iteratively created and refined Achieved by consensus
  • 76.
    E-R Model ConstructsEntities: Entity instance–person, place, object, event, concept (often corresponds to a row in a table) Entity Type–collection of entities (often corresponds to a table) Relationships: Relationship instance–link between entities (corresponds to primary key-foreign key equivalencies in related tables) Relationship type–category of relationship…link between entity types Attribute– property or characteristic of an entity or relationship type (often corresponds to a field in a table)
  • 77.
    Sample E-R Diagram(Figure 3-1)
  • 78.
    Relationship degrees specifynumber of entity types involved Relationship cardinalities specify how many of each entity type is allowed Basic E-R notation (Figure 3-2) Entity symbols A special entity that is also a relationship Relationship symbols Attribute symbols
  • 79.
    What Should anEntity Be? SHOULD BE: An object that will have many instances in the database An object that will be composed of multiple attributes An object that we are trying to model SHOULD NOT BE: A user of the database system An output of the database system (e.g., a report)
  • 80.
    Inappropriate entities Figure3-4 Example of inappropriate entities System user System output Appropriate entities
  • 81.
    Attributes Attribute–property orcharacteristic of an entity or relationahip type Classifications of attributes: Required versus Optional Attributes Simple versus Composite Attribute Single-Valued versus Multivalued Attribute Stored versus Derived Attributes Identifier Attributes
  • 82.
    Identifiers (Keys) Identifier(Key)–An attribute (or combination of attributes) that uniquely identifies individual instances of an entity type Simple versus Composite Identifier Candidate Identifier–an attribute that could be a key…satisfies the requirements for being an identifier
  • 83.
    Characteristics of IdentifiersWill not change in value Will not be null No intelligent identifiers (e.g., containing locations or people that might change) Substitute new, simple keys for long, composite keys
  • 84.
    Figure 3-7 A composite attribute An attribute broken into component parts Figure 3-8 Entity with multivalued attribute (Skill) and derived attribute (Years_Employed) Multivalued an employee can have more than one skill Derived from date employed and current date
  • 85.
    Figure 3-9 Simpleand composite identifier attributes The identifier is boldfaced and underlined
  • 86.
    Figure 3-19 Simple example of time-stamping This attribute that is both multivalued and composite
  • 87.
    More on RelationshipsRelationship Types vs. Relationship Instances The relationship type is modeled as lines between entity types…the instance is between specific entity instances Relationships can have attributes These describe features pertaining to the association between the entities in the relationship Two entities can have more than one type of relationship between them (multiple relationships) Associative Entity–combination of relationship and entity
  • 88.
    Figure 3-10 Relationshiptypes and instances a) Relationship type b) Relationship instances
  • 89.
    Degree of RelationshipsDegree of a relationship is the number of entity types that participate in it Unary Relationship Binary Relationship Ternary Relationship
  • 90.
    Degree of relationships– from Figure 3-2 Entities of two different types related to each other Entities of three different types related to each other One entity related to another of the same entity type
  • 91.
    Cardinality of RelationshipsOne-to-One Each entity in the relationship will have exactly one related entity One-to-Many An entity on one side of the relationship can have many related entities, but an entity on the other side will have a maximum of one related entity Many-to-Many Entities on both sides of the relationship can have many related entities on the other side
  • 92.
    Cardinality Constraints CardinalityConstraints - the number of instances of one entity that can or must be associated with each instance of another entity Minimum Cardinality If zero, then optional If one or more, then mandatory Maximum Cardinality The maximum number
  • 93.
    Figure 3-12 Examplesof relationships of different degrees a) Unary relationships
  • 94.
    Figure 3-12 Examplesof relationships of different degrees (cont.) b) Binary relationships
  • 95.
    Figure 3-12 Examplesof relationships of different degrees (cont.) c) Ternary relationship Note: a relationship can have attributes of its own
  • 96.
    Figure 3-17 Examplesof cardinality constraints a) Mandatory cardinalities A patient must have recorded at least one history, and can have many A patient history is recorded for one and only one patient
  • 97.
    Figure 3-17 Examplesof cardinality constraints (cont.) b) One optional, one mandatory An employee can be assigned to any number of projects, or may not be assigned to any at all A project must be assigned to at least one employee, and may be assigned to many
  • 98.
    Figure 3-17 Examplesof cardinality constraints (cont.) a) Optional cardinalities A person is is married to at most one other person, or may not be married at all
  • 99.
    Entities can berelated to one another in more than one way Figure 3-21 Examples of multiple relationships a) Employees and departments
  • 100.
    Figure 3-21 Examplesof multiple relationships (cont.) b) Professors and courses (fixed lower limit constraint) Here, min cardinality constraint is 2
  • 101.
    Figure 3-15a and3-15b Multivalued attributes can be represented as relationships simple composite
  • 102.
    Strong vs. WeakEntities, and Identifying Relationships Strong entities exist independently of other types of entities has its own unique identifier identifier underlined with single-line Weak entity dependent on a strong entity (identifying owner)…cannot exist on its own does not have a unique identifier (only a partial identifier) Partial identifier underlined with double-line Entity box has double line Identifying relationship links strong entities to weak entities
  • 103.
    Strong entity Weakentity Identifying relationship
  • 104.
    Associative Entities An entity –has attributes A relationship –links entities together When should a relationship with attributes instead be an associative entity ? All relationships for the associative entity should be many The associative entity could have meaning independent of the other entities The associative entity preferably has a unique identifier, and should also have other attributes The associative entity may participate in other relationships other than the entities of the associated relationship Ternary relationships should be converted to associative entities
  • 105.
    Figure 3-11a Abinary relationship with an attribute Here, the date completed attribute pertains specifically to the employee’s completion of a course…it is an attribute of the relationship
  • 106.
    Figure 3-11b Anassociative entity (CERTIFICATE) Associative entity is like a relationship with an attribute, but it is also considered to be an entity in its own right. Note that the many-to-many cardinality between entities in Figure 3-11a has been replaced by two one-to-many relationships with the associative entity.
  • 107.
    Figure 3-13c Anassociative entity – bill of materials structure This could just be a relationship with attributes…it’s a judgment call
  • 108.
    Figure 3-18 Ternaryrelationship as an associative entity
  • 109.
    Microsoft Visio Examplefor E-R diagram Different modeling software tools may have different notation for the same constructs
  • 110.
    Chapter 4: TheEnhanced ER Model and Business Rules Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 111.
    Objectives Definition ofterms Use of supertype/subtype relationships Use of generalization and specialization techniques Specification of completeness and disjointness constraints Develop supertype/subtype hierarchies for realistic business situations Develop entity clusters Explain universal data model Name categories of business rules Define operational constraints graphically and in English
  • 112.
    Supertypes and SubtypesSubtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings Supertype: A generic entity type that has a relationship with one or more subtypes Attribute Inheritance: Subtype entities inherit values of all attributes of the supertype An instance of a subtype is also an instance of the supertype
  • 113.
    Figure 4-1 Basicnotation for supertype/subtype notation a) EER notation
  • 114.
    Different modeling toolsmay have different notation for the same modeling constructs b) Microsoft Visio Notation Figure 4-1 Basic notation for supertype/subtype notation (cont.)
  • 115.
    Figure 4-2 Employee supertype with three subtypes All employee subtypes will have emp nbr, name, address, and date-hired Each employee subtype will also have its own attributes
  • 116.
    Relationships and SubtypesRelationships at the supertype level indicate that all subtypes will participate in the relationship The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level
  • 117.
    Figure 4-3 Supertype/subtyperelationships in a hospital Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed
  • 118.
    Generalization and SpecializationGeneralization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
  • 119.
    Figure 4-4 Exampleof generalization a) Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes
  • 120.
    Figure 4-4 Exampleof generalization (cont.) So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes b) Generalization to VEHICLE supertype
  • 121.
    Figure 4-5 Exampleof specialization a) Entity type PART Only applies to manufactured parts Applies only to purchased parts
  • 122.
    b) Specialization toMANUFACTURED PART and PURCHASED PART Created 2 subtypes Figure 4-5 Example of specialization (cont.) Note: multivalued attribute was replaced by an associative entity relationship to another entity
  • 123.
    Constraints in Supertype/Completeness Constraint Completeness Constraints : Whether an instance of a supertype must also be a member of at least one subtype Total Specialization Rule: Yes (double line) Partial Specialization Rule: No (single line)
  • 124.
    Figure 4-6 Examplesof completeness constraints a) Total specialization rule A patient must be either an outpatient or a resident patient
  • 125.
    b) Partial specializationrule Figure 4-6 Examples of completeness constraints (cont.) A vehicle could be a car, a truck, or neither
  • 126.
    Constraints in Supertype/Disjointness constraint Disjointness Constraints : Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes Disjoint Rule: An instance of the supertype can be only ONE of the subtypes Overlap Rule: An instance of the supertype could be more than one of the subtypes
  • 127.
    a) Disjoint ruleFigure 4-7 Examples of disjointness constraints A patient can either be outpatient or resident, but not both
  • 128.
    b) Overlap ruleFigure 4-7 Examples of disjointness constraints (cont.) A part may be both purchased and manufactured
  • 129.
    Constraints in Supertype/Subtype Discriminators Subtype Discriminator : An attribute of the supertype whose values determine the target subtype(s) Disjoint – a simple attribute with alternative values to indicate the possible subtypes Overlapping – a composite attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
  • 130.
    Figure 4-8 Introducinga subtype discriminator ( disjoint rule) A simple attribute with different possible values indicating the subtype
  • 131.
    Figure 4-9 Subtypediscriminator ( overlap rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype
  • 132.
    Figure 4-10 Exampleof supertype/subtype hierarchy
  • 133.
    Entity Clusters EERdiagrams are difficult to read when there are too many entities and relationships Solution: Group entities and relationships into entity clusters Entity cluster : Set of one or more entity types and associated relationships grouped into a single abstract entity type
  • 134.
    Figure 4-13a Possible entity clusters for Pine Valley Furniture in Microsoft Visio Related groups of entities could become clusters
  • 135.
    Figure 4-13b EERdiagram of PVF entity clusters More readable, isn’t it?
  • 136.
    Figure 4-14 Manufacturingentity cluster Detail for a single cluster
  • 137.
    Chapter 4: TheEnhanced ER Model and Business Rules Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 138.
    Objectives Definition ofterms Use of supertype/subtype relationships Use of generalization and specialization techniques Specification of completeness and disjointness constraints Develop supertype/subtype hierarchies for realistic business situations Develop entity clusters Explain universal data model Name categories of business rules Define operational constraints graphically and in English
  • 139.
    Supertypes and SubtypesSubtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings Supertype: A generic entity type that has a relationship with one or more subtypes Attribute Inheritance: Subtype entities inherit values of all attributes of the supertype An instance of a subtype is also an instance of the supertype
  • 140.
    Figure 4-1 Basicnotation for supertype/subtype notation a) EER notation
  • 141.
    Different modeling toolsmay have different notation for the same modeling constructs b) Microsoft Visio Notation Figure 4-1 Basic notation for supertype/subtype notation (cont.)
  • 142.
    Figure 4-2 Employee supertype with three subtypes All employee subtypes will have emp nbr, name, address, and date-hired Each employee subtype will also have its own attributes
  • 143.
    Relationships and SubtypesRelationships at the supertype level indicate that all subtypes will participate in the relationship The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level
  • 144.
    Figure 4-3 Supertype/subtyperelationships in a hospital Both outpatients and resident patients are cared for by a responsible physician Only resident patients are assigned to a bed
  • 145.
    Generalization and SpecializationGeneralization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
  • 146.
    Figure 4-4 Exampleof generalization a) Three entity types: CAR, TRUCK, and MOTORCYCLE All these types of vehicles have common attributes
  • 147.
    Figure 4-4 Exampleof generalization (cont.) So we put the shared attributes in a supertype Note: no subtype for motorcycle, since it has no unique attributes b) Generalization to VEHICLE supertype
  • 148.
    Figure 4-5 Exampleof specialization a) Entity type PART Only applies to manufactured parts Applies only to purchased parts
  • 149.
    b) Specialization toMANUFACTURED PART and PURCHASED PART Created 2 subtypes Figure 4-5 Example of specialization (cont.) Note: multivalued attribute was replaced by an associative entity relationship to another entity
  • 150.
    Constraints in Supertype/Completeness Constraint Completeness Constraints : Whether an instance of a supertype must also be a member of at least one subtype Total Specialization Rule: Yes (double line) Partial Specialization Rule: No (single line)
  • 151.
    Figure 4-6 Examplesof completeness constraints a) Total specialization rule A patient must be either an outpatient or a resident patient
  • 152.
    b) Partial specializationrule Figure 4-6 Examples of completeness constraints (cont.) A vehicle could be a car, a truck, or neither
  • 153.
    Constraints in Supertype/Disjointness constraint Disjointness Constraints : Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes Disjoint Rule: An instance of the supertype can be only ONE of the subtypes Overlap Rule: An instance of the supertype could be more than one of the subtypes
  • 154.
    a) Disjoint ruleFigure 4-7 Examples of disjointness constraints A patient can either be outpatient or resident, but not both
  • 155.
    b) Overlap ruleFigure 4-7 Examples of disjointness constraints (cont.) A part may be both purchased and manufactured
  • 156.
    Constraints in Supertype/Subtype Discriminators Subtype Discriminator : An attribute of the supertype whose values determine the target subtype(s) Disjoint – a simple attribute with alternative values to indicate the possible subtypes Overlapping – a composite attribute whose subparts pertain to different subtypes. Each subpart contains a boolean value to indicate whether or not the instance belongs to the associated subtype
  • 157.
    Figure 4-8 Introducinga subtype discriminator ( disjoint rule) A simple attribute with different possible values indicating the subtype
  • 158.
    Figure 4-9 Subtypediscriminator ( overlap rule) A composite attribute with sub-attributes indicating “yes” or “no” to determine whether it is of each subtype
  • 159.
    Figure 4-10 Exampleof supertype/subtype hierarchy
  • 160.
    Entity Clusters EERdiagrams are difficult to read when there are too many entities and relationships Solution: Group entities and relationships into entity clusters Entity cluster : Set of one or more entity types and associated relationships grouped into a single abstract entity type
  • 161.
    Figure 4-13a Possible entity clusters for Pine Valley Furniture in Microsoft Visio Related groups of entities could become clusters
  • 162.
    Figure 4-13b EERdiagram of PVF entity clusters More readable, isn’t it?
  • 163.
    Figure 4-14 Manufacturingentity cluster Detail for a single cluster
  • 164.
    Packaged data modelsprovide generic models that can be customized for a particular organization’s business rules
  • 165.
    Business rules Statementsthat define or constrain some aspect of the business Classification of business rules: Derivation–rule derived from other knowledge, often in the form of a formula using attribute values Structural assertion–rule expressing static structure. Includes attributes, relationships, and definitions Action assertion–rule expressing constraints/control of organizational actions
  • 166.
    Figure 4-18 EERdiagram to describe business rules
  • 167.
    Types of ActionAssertions Result Condition–IF/THEN rule Integrity constraint–must always be true Authorization–privilege statement Form Enabler–leads to creation of new object Timer–allows or disallows an action Executive–executes one or more actions Rigor Controlling–something must or must not happen Influencing–guideline for which a notification must occur
  • 168.
    Stating an ActionAssertion Anchor Object–an object on which actions are limited Action–creation, deletion, update, or read Corresponding Objects–an object influencing the ability to perform an action on another business rule Action assertions identify corresponding objects that constrain the ability to perform actions on anchor objects
  • 169.
    Figure 4-19 Datamodel segment for class scheduling
  • 170.
    Figure 4-20 Business Rule 1: For a faculty member to be assigned to teach a section of a course, the faculty member must be qualified to teach the course for which that section is scheduled Action assertion Anchor object Corresponding object Corresponding object In this case, the action assertion is a R estriction
  • 171.
    Figure 4-21 Business Rule 2: For a faculty member to be assigned to teach a section of a course, the faculty member must not be assigned to teach a total of more than three course sections Action assertion Anchor object Corresponding object In this case, the action assertion is an U pper LIM it
  • 172.
    Chapter 5: LogicalDatabase Design and the Relational Model Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 173.
    Objectives Definition ofterms List five properties of relations State two properties of candidate keys Define first, second, and third normal form Describe problems from merging relations Transform E-R and EER diagrams to relations Create tables with entity and relational integrity constraints Use normalization to convert anomalous tables to well-structured relations
  • 174.
    Relation Definition: Arelation is a named, two-dimensional table of data Table consists of rows (records) and columns (attribute or field) Requirements for a table to qualify as a relation: It must have a unique name Every attribute value must be atomic (not multivalued, not composite) Every row must be unique (can’t have two rows with exactly the same values for all their fields) Attributes (columns) in tables must have unique names The order of the columns must be irrelevant The order of the rows must be irrelevant NOTE: all relations are in 1 st Normal form
  • 175.
    Correspondence with E-RModel Relations (tables) correspond with entity types and with many-to-many relationship types Rows correspond with entity instances and with many-to-many relationship instances Columns correspond with attributes NOTE: The word relation (in relational database) is NOT the same as the word relationship (in E-R model)
  • 176.
    Key Fields Keysare special fields that serve two main purposes: Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship) Keys can be simple (a single field) or composite (more than one field) Keys usually are used as indexes to speed up the response to user queries (More on this in Ch. 6)
  • 177.
    Figure 5-3 Schemafor four relations (Pine Valley Furniture Company) Primary Key Foreign Key (implements 1:N relationship between customer and order) Combined, these are a composite primary key (uniquely identifies the order line)…individually they are foreign keys (implement M:N relationship between order and product)
  • 178.
    Integrity Constraints DomainConstraints Allowable values for an attribute. See Table 5-1 Entity Integrity No primary key attribute may be null. All primary key fields MUST have data Action Assertions Business rules. Recall from Ch. 4
  • 179.
    Domain definitions enforcedomain integrity constraints
  • 180.
    Integrity Constraints ReferentialIntegrity–rule states that any foreign key value (on the relation of the many side) MUST match a primary key value in the relation of the one side. (Or the foreign key can be null) For example: Delete Rules Restrict–don’t allow delete of “parent” side if related rows exist in “dependent” side Cascade–automatically delete “dependent” side rows that correspond with the “parent” side row to be deleted Set-to-Null–set the foreign key in the dependent side to null if deleting from the parent side  not allowed for weak entities
  • 181.
    Figure 5-5 Referential integrity constraints (Pine Valley Furniture) Referential integrity constraints are drawn via arrows from dependent to parent table
  • 182.
    Figure 5-6 SQLtable definitions Referential integrity constraints are implemented with foreign key to primary key references
  • 183.
    Transforming EER Diagramsinto Relations Mapping Regular Entities to Relations Simple attributes: E-R attributes map directly onto the relation Composite attributes: Use only their simple, component attributes Multivalued Attribute–Becomes a separate relation with a foreign key taken from the superior entity
  • 184.
    (a) CUSTOMER entitytype with simple attributes Figure 5-8 Mapping a regular entity (b) CUSTOMER relation
  • 185.
    (a) CUSTOMER entitytype with composite attribute Figure 5-9 Mapping a composite attribute (b) CUSTOMER relation with address detail
  • 186.
    Figure 5-10 Mappingan entity with a multivalued attribute One–to–many relationship between original entity and new relation (a) Multivalued attribute becomes a separate relation with foreign key (b)
  • 187.
    Transforming EER Diagramsinto Relations (cont.) Mapping Weak Entities Becomes a separate relation with a foreign key taken from the superior entity Primary key composed of: Partial identifier of weak entity Primary key of identifying relation (strong entity)
  • 188.
    Figure 5-11 Exampleof mapping a weak entity a) Weak entity DEPENDENT
  • 189.
    NOTE: the domainconstraint for the foreign key should NOT allow null value if DEPENDENT is a weak entity Foreign key Figure 5-11 Example of mapping a weak entity (cont.) b) Relations resulting from weak entity Composite primary key
  • 190.
    Transforming EER Diagramsinto Relations (cont.) Mapping Binary Relationships One-to-Many–Primary key on the one side becomes a foreign key on the many side Many-to-Many–Create a new relation with the primary keys of the two entities as its primary key One-to-One–Primary key on the mandatory side becomes a foreign key on the optional side
  • 191.
    Figure 5-12 Exampleof mapping a 1:M relationship a) Relationship between customers and orders Note the mandatory one Again, no null value in the foreign key…this is because of the mandatory minimum cardinality Foreign key b) Mapping the relationship
  • 192.
    Figure 5-13 Exampleof mapping an M:N relationship a) Completes relationship (M:N) The Completes relationship will need to become a separate relation
  • 193.
    New intersectionrelation Figure 5-13 Example of mapping an M:N relationship (cont.) b) Three resulting relations Foreign key Foreign key Composite primary key
  • 194.
    Figure 5-14 Exampleof mapping a binary 1:1 relationship a) In_charge relationship (1:1) Often in 1:1 relationships, one direction is optional.
  • 195.
    b) Resulting relationsFigure 5-14 Example of mapping a binary 1:1 relationship (cont.) Foreign key goes in the relation on the optional side, Matching the primary key on the mandatory side
  • 196.
    Transforming EER Diagramsinto Relations (cont.) Mapping Associative Entities Identifier Not Assigned Default primary key for the association relation is composed of the primary keys of the two entities (as in M:N relationship) Identifier Assigned It is natural and familiar to end-users Default identifier may not be unique
  • 197.
    Figure 5-15 Exampleof mapping an associative entity a) An associative entity
  • 198.
    Figure 5-15 Exampleof mapping an associative entity (cont.) b) Three resulting relations Composite primary key formed from the two foreign keys
  • 199.
    Figure 5-16 Exampleof mapping an associative entity with an identifier a) SHIPMENT associative entity
  • 200.
    Figure 5-16 Exampleof mapping an associative entity with an identifier (cont.) b) Three resulting relations Primary key differs from foreign keys
  • 201.
    Transforming EER Diagramsinto Relations (cont.) Mapping Unary Relationships One-to-Many–Recursive foreign key in the same relation Many-to-Many–Two relations: One for the entity type One for an associative relation in which the primary key has two attributes, both taken from the primary key of the entity
  • 202.
    Figure 5-17 Mappinga unary 1:N relationship (a) EMPLOYEE entity with unary relationship (b) EMPLOYEE relation with recursive foreign key
  • 203.
    Figure 5-18 Mappinga unary M:N relationship (a) Bill-of-materials relationships (M:N) (b) ITEM and COMPONENT relations
  • 204.
    Transforming EER Diagramsinto Relations (cont.) Mapping Ternary (and n-ary) Relationships One relation for each entity and one for the associative entity Associative entity has foreign keys to each entity in the relationship
  • 205.
    Figure 5-19 Mappinga ternary relationship a) PATIENT TREATMENT Ternary relationship with associative entity
  • 206.
    b) Mapping theternary relationship PATIENT TREATMENT Remember that the primary key MUST be unique Figure 5-19 Mapping a ternary relationship (cont.) This is why treatment date and time are included in the composite primary key But this makes a very cumbersome key… It would be better to create a surrogate key like Treatment#
  • 207.
    Transforming EER Diagramsinto Relations (cont.) Mapping Supertype/Subtype Relationships One relation for supertype and for each subtype Supertype attributes (including identifier and subtype discriminator) go into supertype relation Subtype attributes go into each subtype; primary key of supertype relation also becomes primary key of subtype relation 1:1 relationship established between supertype and each subtype, with supertype as primary table
  • 208.
  • 209.
    Figure 5-21 Mapping Supertype/subtype relationships to relations These are implemented as one-to-one relationships
  • 210.
    Data Normalization Primarilya tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well-structured relations
  • 211.
    Well-Structured Relations Arelation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies Insertion Anomaly –adding new rows forces user to create duplicate data Deletion Anomaly –deleting rows may cause a loss of data that would be needed for other future rows Modification Anomaly –changing data in a row forces changes to other rows because of duplication General rule of thumb: A table should not pertain to more than one entity type
  • 212.
    Example–Figure 5-2b Question–Isthis a relation? Answer–Yes: Unique rows and no multivalued attributes Question–What’s the primary key? Answer–Composite: Emp_ID, Course_Title
  • 213.
    Anomalies in thisTable Insertion –can’t enter a new employee without having the employee take a class Deletion –if we remove employee 140, we lose information about the existence of a Tax Acc class Modification –giving a salary increase to employee 100 forces us to update multiple records Why do these anomalies exist? Because there are two themes (entity types) in this one relation. This results in data duplication and an unnecessary dependency between the entities
  • 214.
    Functional Dependencies andKeys Functional Dependency: The value of one attribute (the determinant ) determines the value of another attribute Candidate Key: A unique identifier. One of the candidate keys will become the primary key E.g. perhaps there is both credit card number and SS# in a table…in this case both are candidate keys Each non-key field is functionally dependent on every candidate key
  • 215.
    Figure 5.22 Stepsin normalization
  • 216.
    First Normal FormNo multivalued attributes Every attribute value is atomic Fig. 5-25 is not in 1 st Normal Form (multivalued attributes)  it is not a relation Fig. 5-26 is in 1 st Normal form All relations are in 1 st Normal Form
  • 217.
    Table with multivaluedattributes, not in 1 st normal form Note: this is NOT a relation
  • 218.
    Table with nomultivalued attributes and unique rows, in 1 st normal form Note: this is relation, but not a well-structured one
  • 219.
    Anomalies in thisTable Insertion –if new product is ordered for order 1007 of existing customer, customer data must be re-entered, causing duplication Deletion –if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price Update –changing the price of product ID 4 requires update in several records Why do these anomalies exist? Because there are multiple themes (entity types) in one relation. This results in duplication and an unnecessary dependency between the entities
  • 220.
    Second Normal Form1NF PLUS every non-key attribute is fully functionally dependent on the ENTIRE primary key Every non-key attribute must be defined by the entire key, not by only part of the key No partial functional dependencies
  • 221.
    Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address Therefore, NOT in 2 nd Normal Form Customer_ID  Customer_Name, Customer_Address Product_ID  Product_Description, Product_Finish, Unit_Price Order_ID, Product_ID  Order_Quantity Figure 5-27 Functional dependency diagram for INVOICE
  • 222.
    Partial dependencies areremoved, but there are still transitive dependencies Getting it into Second Normal Form Figure 5-28 Removing partial dependencies
  • 223.
    Third Normal Form2NF PLUS no transitive dependencies (functional dependencies on non-primary-key attributes) Note: This is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third Solution: Non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table
  • 224.
    Transitive dependencies areremoved Figure 5-28 Removing partial dependencies Getting it into Third Normal Form
  • 225.
    Merging Relations ViewIntegration–Combining entities from multiple ER models into common relations Issues to watch out for when merging entities from different ER models: Synonyms–two or more attributes with different names but same meaning Homonyms–attributes with same name but different meanings Transitive dependencies–even if relations are in 3NF prior to merging, they may not be after merging Supertype/subtype relationships–may be hidden prior to merging
  • 226.
    Enterprise Keys Primarykeys that are unique in the whole database, not just within a single relation Corresponds with the concept of an object ID in object-oriented systems
  • 227.
    Figure 5-31 Enterprisekeys a) Relations with enterprise key b) Sample data with enterprise key
  • 228.
    Chapter 6: PhysicalDatabase Design and Performance Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 229.
    Objectives Definition ofterms Describe the physical database design process Choose storage formats for attributes Select appropriate file organizations Describe three types of file organization Describe indexes and their appropriate use Translate a database model into efficient structures Know when and how to use denormalization
  • 230.
    Physical Database DesignPurpose–translate the logical description of data into the technical specifications for storing and retrieving data Goal–create a design for storing data that will provide adequate performance and insure database integrity , security , and recoverability
  • 231.
    Physical Design ProcessNormalized relations Volume estimates Attribute definitions Response time expectations Data security needs Backup/recovery needs Integrity expectations DBMS technology used Inputs Attribute data types Physical record descriptions (doesn’t always match logical design) File organizations Indexes and database architectures Query optimization Leads to Decisions
  • 232.
    Figure 6-1 Compositeusage map (Pine Valley Furniture Company)
  • 233.
    Figure 6-1 Compositeusage map (Pine Valley Furniture Company) (cont.) Data volumes
  • 234.
    Figure 6-1 Compositeusage map (Pine Valley Furniture Company) (cont.) Access Frequencies (per hour)
  • 235.
    Figure 6-1 Compositeusage map (Pine Valley Furniture Company) (cont.) Usage analysis: 140 purchased parts accessed per hour  80 quotations accessed from these 140 purchased part accesses  70 suppliers accessed from these 80 quotation accesses
  • 236.
    Figure 6-1 Compositeusage map (Pine Valley Furniture Company) (cont.) Usage analysis: 75 suppliers accessed per hour  40 quotations accessed from these 75 supplier accesses  40 purchased parts accessed from these 40 quotation accesses
  • 237.
    Designing Fields Field:smallest unit of data in database Field design Choosing data type Coding, compression, encryption Controlling data integrity
  • 238.
    Choosing Data TypesCHAR–fixed-length character VARCHAR2–variable-length character (memo) LONG–large number NUMBER–positive/negative number INEGER–positive/negative whole number DATE–actual date BLOB–binary large object (good for graphics, sound clips, etc.)
  • 239.
    Figure 6-2 Example code look-up table (Pine Valley Furniture Company) Code saves space, but costs an additional lookup to obtain actual value
  • 240.
    Field Data IntegrityDefault value–assumed value if no explicit value Range control–allowable value limitations (constraints or validation rules) Null value control–allowing or prohibiting empty fields Referential integrity–range control (and null value allowances) for foreign-key to primary-key match-ups Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity
  • 241.
    Handling Missing DataSubstitute an estimate of the missing value (e.g., using a formula) Construct a report listing missing values In programs, ignore missing data unless the value is significant (sensitivity testing) Triggers can be used to perform these operations
  • 242.
    Physical Records PhysicalRecord: A group of fields stored in adjacent memory locations and retrieved together as a unit Page: The amount of data read or written in one I/O operation Blocking Factor: The number of physical records per page
  • 243.
    Denormalization Transforming normalized relations into unnormalized physical record specifications Benefits: Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries ) Costs (due to data duplication) Wasted storage space Data integrity/consistency threats Common denormalization opportunities One-to-one relationship (Fig. 6-3) Many-to-many relationship with attributes (Fig. 6-4) Reference data (1:N relationship where 1-side has data not used in any other relationship) (Fig. 6-5)
  • 244.
    Figure 6-3 A possible denormalization situation: two entities with one-to-one relationship
  • 245.
    Figure 6-4 A possible denormalization situation: a many-to-many relationship with nonkey attributes Extra table access required Null description possible
  • 246.
    Figure 6-5 Apossible denormalization situation: reference data Extra table access required Data duplication
  • 247.
    Partitioning Horizontal Partitioning:Distributing the rows of a table into several separate files Useful for situations where different users need access to different rows Three types: Key Range Partitioning, Hash Partitioning, or Composite Partitioning Vertical Partitioning: Distributing the columns of a table into several separate relations Useful for situations where different users need access to different columns The primary key must be repeated in each file Combinations of Horizontal and Vertical Partitions often correspond with User Schemas (user views)
  • 248.
    Partitioning (cont.) Advantagesof Partitioning: Efficiency: Records used together are grouped together Local optimization: Each partition can be optimized for performance Security, recovery Load balancing: Partitions stored on different disks, reduces contention Take advantage of parallel processing capability Disadvantages of Partitioning: Inconsistent access speed: Slow retrievals across partitions Complexity: Non-transparent partitioning Extra space or update time: Duplicate data; access from multiple partitions
  • 249.
    Data Replication Purposelystoring the same data in multiple locations of the database Improves performance by allowing multiple users to access the same data at the same time with minimum contention Sacrifices data integrity due to data duplication Best for data that is not updated often
  • 250.
    Designing Physical FilesPhysical File: A named portion of secondary memory allocated for the purpose of storing physical records Tablespace–named set of disk storage elements in which physical files for database tables can be stored Extent–contiguous section of disk space Constructs to link two pieces of data: Sequential storage Pointers–field of data that can be used to locate related fields or records
  • 251.
    Figure 6-4 Physical file terminology in an Oracle environment
  • 252.
    File Organizations Techniquefor physically arranging records of a file on secondary storage Factors for selecting file organization: Fast data retrieval and throughput Efficient storage space utilization Protection from failure and data loss Minimizing need for reorganization Accommodating growth Security from unauthorized use Types of file organizations Sequential Indexed Hashed
  • 253.
    Figure 6-7a Sequential file organization If not sorted Average time to find desired record = n/2 1 2 n Records of the file are stored in sequence by the primary key field values If sorted – every insert or delete requires resort
  • 254.
    Indexed File OrganizationsIndex–a separate table that contains organization of records for quick retrieval Primary keys are automatically indexed Oracle has a CREATE INDEX operation, and MS ACCESS allows indexes to be created for most field types Indexing approaches: B-tree index, Fig. 6-7b Bitmap index, Fig. 6-8 Hash Index, Fig. 6-7c Join Index, Fig 6-9
  • 255.
    Figure 6-7b B-treeindex uses a tree search Average time to find desired record = depth of the tree Leaves of the tree are all at same level  consistent access time
  • 256.
    Figure 6-7c Hashed file or index organization Hash algorithm Usually uses division-remainder to determine record position. Records with same position are grouped in lists
  • 257.
    Figure 6-8 Bitmap index index organization Bitmap saves on space requirements Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values
  • 258.
    Figure 6-9 Join Indexes–speeds up join operations
  • 260.
    Clustering Files Insome relational DBMSs, related records from different tables can be stored together in the same disk area Useful for improving performance of join operations Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table e.g. Oracle has a CREATE CLUSTER command
  • 261.
    Rules for UsingIndexes Use on larger tables Index the primary key of each table Index search fields (fields frequently in WHERE clause) Fields in SQL ORDER BY and GROUP BY commands When there are >100 values but not when there are <30 values
  • 262.
    Rules for UsingIndexes (cont.) Avoid use of indexes for fields with long values; perhaps compress values first DBMS may have limit on number of indexes per table and number of bytes per indexed field(s) Null values will not be referenced from an index Use indexes heavily for non-volatile databases; limit the use of indexes for volatile databases Why? Because modifications (e.g. inserts, deletes) require updates to occur in index files
  • 263.
    RAID Redundant Arrayof Inexpensive Disks A set of disk drives that appear to the user to be a single disk drive Allows parallel access to data (improves access speed) Pages are arranged in stripes
  • 264.
    Figure 6-10 RAIDwith four disks and striping Here, pages 1-4 can be read/written simultaneously
  • 265.
    Raid Types (Figure6-10) Raid 0 Maximized parallelism No redundancy No error correction no fault-tolerance Raid 1 Redundant data – fault tolerant Most common form Raid 2 No redundancy One record spans across data disks Error correction in multiple disks– reconstruct damaged data Raid 3 Error correction in one disk Record spans multiple data disks (more than RAID2) Not good for multi-user environments, Raid 4 Error correction in one disk Multiple records per stripe Parallelism, but slow updates due to error correction contention Raid 5 Rotating parity array Error correction takes place in same disks as data storage Parallelism, better performance than Raid4
  • 266.
    Database Architectures (Figure 6-11) Legacy Systems Current Technology Data Warehouses
  • 267.
    Chapter 7: Introductionto SQL Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 268.
    Objectives Definition ofterms Interpret history and role of SQL Define a database using SQL data definition language Write single table queries using SQL Establish referential integrity using SQL Discuss SQL:1999 and SQL:2003 standards
  • 269.
    SQL Overview StructuredQuery Language The standard for relational database management systems (RDBMS) RDBMS: A database management system that manages data as a collection of tables in which all relationships are represented by common values in related tables
  • 270.
    History of SQL1970–E. Codd develops relational database concept 1974-1979–System R with Sequel (later SQL) created at IBM Research Lab 1979–Oracle markets first relational DB with SQL 1986–ANSI SQL standard released 1989, 1992, 1999, 2003–Major ANSI standard updates Current–SQL is supported by most major database vendors
  • 271.
    Purpose of SQLStandard Specify syntax/semantics for data definition and manipulation Define data structures Enable portability Specify minimal (level 1) and complete (level 2) standards Allow for later growth/enhancement to standard
  • 272.
    Benefits of aStandardized Relational Language Reduced training costs Productivity Application portability Application longevity Reduced dependence on a single vendor Cross-system communication
  • 273.
    SQL Environment Catalog A set of schemas that constitute the description of a database Schema The structure that contains descriptions of objects created by a user (base tables, views, constraints) Data Definition Language (DDL) Commands that define a database, including creating, altering, and dropping tables and establishing constraints Data Manipulation Language (DML) Commands that maintain and query a database Data Control Language (DCL) Commands that control a database, including administering privileges and committing data
  • 274.
    Figure 7-1 Asimplified schematic of a typical SQL environment, as described by the SQL-2003 standard
  • 275.
  • 276.
    Figure 7-4 DDL, DML, DCL, and the database development process
  • 277.
    SQL Database DefinitionData Definition Language (DDL) Major CREATE statements: CREATE SCHEMA–defines a portion of the database owned by a particular user CREATE TABLE–defines a table and its columns CREATE VIEW–defines a logical table from one or more views Other CREATE statements: CHARACTER SET, COLLATION, TRANSLATION, ASSERTION, DOMAIN
  • 278.
    Table Creation Figure7-5 General syntax for CREATE TABLE Steps in table creation: Identify data types for attributes Identify columns that can and cannot be null Identify columns that must be unique (candidate keys) Identify primary key – foreign key mates Determine default values Identify constraints on columns (domain specifications) Create the table and associated indexes
  • 279.
    The following slidescreate tables for this enterprise data model
  • 280.
    Figure 7-6 SQLdatabase definition commands for Pine Valley Furniture Overall table definitions
  • 281.
    Defining attributes andtheir data types
  • 282.
    Non-nullable specification Identifyingprimary key Primary keys can never have NULL values
  • 283.
    Non-nullable specifications Primarykey Some primary keys are composite– composed of multiple attributes
  • 284.
    Default value Domainconstraint Controlling the values in attributes
  • 285.
    Primary key of parent table Identifying foreign keys and establishing relationships Foreign key of dependent table
  • 286.
    Data Integrity ControlsReferential integrity–constraint that ensures that foreign key values of a table must match primary key values of a related table in 1:M relationships Restricting: Deletes of primary records Updates of primary records Inserts of dependent records
  • 287.
    Relational integrity isenforced via the primary-key to foreign-key match Figure 7-7 Ensuring data integrity through updates
  • 288.
    Changing and RemovingTables ALTER TABLE statement allows you to change column specifications: ALTER TABLE CUSTOMER_T ADD (TYPE VARCHAR(2)) DROP TABLE statement allows you to remove tables from your schema: DROP TABLE CUSTOMER_T
  • 289.
    Schema Definition Controlprocessing/storage efficiency: Choice of indexes File organizations for base tables File organizations for indexes Data clustering Statistics maintenance Creating indexes Speed up random/sequential access to base table data Example CREATE INDEX NAME_IDX ON CUSTOMER_T(CUSTOMER_NAME) This makes an index for the CUSTOMER_NAME field of the CUSTOMER_T table
  • 290.
    Insert Statement Addsdata to a table Inserting into a table INSERT INTO CUSTOMER_T VALUES (001, ‘Contemporary Casuals’, ‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601); Inserting a record that has some null attributes requires identifying the fields that actually get data INSERT INTO PRODUCT_T (PRODUCT_ID, PRODUCT_DESCRIPTION,PRODUCT_FINISH, STANDARD_PRICE, PRODUCT_ON_HAND) VALUES (1, ‘End Table’, ‘Cherry’, 175, 8); Inserting from another table INSERT INTO CA_CUSTOMER_T SELECT * FROM CUSTOMER_T WHERE STATE = ‘CA’;
  • 291.
    Creating Tables withIdentity Columns Inserting into a table does not require explicit customer ID entry or field list INSERT INTO CUSTOMER_T VALUES ( ‘Contemporary Casuals’, ‘1355 S. Himes Blvd.’, ‘Gainesville’, ‘FL’, 32601); New with SQL:2003
  • 292.
    Delete Statement Removesrows from a table Delete certain rows DELETE FROM CUSTOMER_T WHERE STATE = ‘HI’; Delete all rows DELETE FROM CUSTOMER_T;
  • 293.
    Update Statement Modifiesdata in existing rows UPDATE PRODUCT_T SET UNIT_PRICE = 775 WHERE PRODUCT_ID = 7;
  • 294.
    Merge Statement Makesit easier to update a table…allows combination of Insert and Update in one statement Useful for updating master tables with new data
  • 295.
    SELECT Statement Usedfor queries on single or multiple tables Clauses of the SELECT statement: SELECT List the columns (and expressions) that should be returned from the query FROM Indicate the table(s) or view(s) from which data will be obtained WHERE Indicate the conditions under which a row will be included in the result GROUP BY Indicate categorization of results HAVING Indicate the conditions under which a category (group) will be included ORDER BY Sorts the result according to specified criteria
  • 296.
    Figure 7-10 SQL statement processing order (adapted from van der Lans, p.100)
  • 297.
    SELECT Example Findproducts with standard price less than $275 SELECT PRODUCT_NAME, STANDARD_PRICE FROM PRODUCT_V WHERE STANDARD_PRICE < 275; Table 7-3: Comparison Operators in SQL
  • 298.
    SELECT Example UsingAlias Alias is an alternative column or table name SELECT CUST .CUSTOMER AS NAME , CUST.CUSTOMER_ADDRESS FROM CUSTOMER_V CUST WHERE NAME = ‘Home Furnishings’;
  • 299.
    SELECT Example Using a Function Using the COUNT aggregate function to find totals SELECT COUNT(*) FROM ORDER_LINE_V WHERE ORDER_ID = 1004; Note: with aggregate functions you can’t have single-valued columns included in the SELECT clause
  • 300.
    SELECT Example–Boolean OperatorsAND , OR , and NOT Operators for customizing conditions in WHERE clause SELECT PRODUCT_DESCRIPTION, PRODUCT_FINISH, STANDARD_PRICE FROM PRODUCT_V WHERE (PRODUCT_DESCRIPTION LIKE ‘ % Desk’ OR PRODUCT_DESCRIPTION LIKE ‘ % Table’) AND UNIT_PRICE > 300; Note: the LIKE operator allows you to compare strings using wildcards. For example, the % wildcard in ‘%Desk’ indicates that all strings that have any number of characters preceding the word “Desk” will be allowed
  • 301.
    Venn Diagram fromPrevious Query
  • 302.
    SELECT Example – Sorting Results with the ORDER BY Clause Sort the results first by STATE, and within a state by CUSTOMER_NAME SELECT CUSTOMER_NAME, CITY, STATE FROM CUSTOMER_V WHERE STATE IN (‘FL’, ‘TX’, ‘CA’, ‘HI’) ORDER BY STATE, CUSTOMER_NAME; Note: the IN operator in this example allows you to include rows whose STATE value is either FL, TX, CA, or HI. It is more efficient than separate OR conditions
  • 303.
    SELECT Example– Categorizing Results Using the GROUP BY Clause For use with aggregate functions Scalar aggregate : single value returned from SQL query with aggregate function Vector aggregate : multiple values returned from SQL query with aggregate function (via GROUP BY) SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE) FROM CUSTOMER_V GROUP BY CUSTOMER_STATE; Note: you can use single-value fields with aggregate functions if they are included in the GROUP BY clause
  • 304.
    SELECT Example– Qualifying Results by Categories Using the HAVING Clause For use with GROUP BY SELECT CUSTOMER_STATE, COUNT(CUSTOMER_STATE) FROM CUSTOMER_V GROUP BY CUSTOMER_STATE HAVING COUNT(CUSTOMER_STATE) > 1; Like a WHERE clause, but it operates on groups (categories), not on individual rows. Here, only those groups with total numbers greater than 1 will be included in final result
  • 305.
    Using and DefiningViews Views provide users controlled access to tables Base Table–table containing the raw data Dynamic View A “virtual table” created dynamically upon request by a user No data actually stored; instead data from base table made available to user Based on SQL SELECT statement on base tables or other views Materialized View Copy or replication of data Data actually stored Must be refreshed periodically to match the corresponding base tables
  • 306.
    Sample CREATE VIEWCREATE VIEW EXPENSIVE_STUFF_V AS SELECT PRODUCT_ID, PRODUCT_NAME, UNIT_PRICE FROM PRODUCT_T WHERE UNIT_PRICE >300 WITH CHECK_OPTION; View has a name View is based on a SELECT statement CHECK_OPTION works only for updateable views and prevents updates that would create rows not included in the view
  • 307.
    Advantages of ViewsSimplify query commands Assist with data security (but don't rely on views for security, there are more important security measures) Enhance programming productivity Contain most current base table data Use little storage space Provide customized view for user Establish physical data independence
  • 308.
    Disadvantages of ViewsUse processing time each time view is referenced May or may not be directly updateable
  • 309.
    Chapter 8: AdvancedSQL Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 310.
    Objectives Definition ofterms Write multiple table SQL queries Define and use three types of joins Write correlated and noncorrelated subqueries Establish referential integrity in SQL Understand triggers and stored procedures Discuss SQL:1999 standard and its extension of SQL-92
  • 311.
    Processing Multiple Tables–JoinsJoin – a relational operation that causes two or more tables with a common domain to be combined into a single table or view Equi-join – a join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table Natural join – an equi-join in which one of the duplicate columns is eliminated in the result table Outer join – a join in which rows that do not have matching values in common columns are nonetheless included in the result table (as opposed to inner join, in which rows must have matching values in order to appear in the result table) Union join – includes all columns from each table in the join, and an instance for each row of each table The common columns in joined tables are usually the primary key of the dominant table and the foreign key of the dependent table in 1:M relationships
  • 312.
    The following slidescreate tables for this enterprise data model
  • 313.
    These tables areused in queries that follow Figure 8-1 Pine Valley Furniture Company Customer and Order tables with pointers from customers to their orders
  • 314.
    For each customerwho placed an order, what is the customer’s name and order number? SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID FROM CUSTOMER_T NATURAL JOIN ORDER_T ON CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID; Natural Join Example Note: from Fig. 1, you see that only 10 Customers have links with orders.  Only 10 rows will be returned from this INNER join. Join involves multiple tables in FROM clause ON clause performs the equality check for common columns of the two tables
  • 315.
    List the customername, ID number, and order number for all customers. Include customer information even for customers that do have an order SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, ORDER_ID FROM CUSTOMER_T, LEFT OUTER JOIN ORDER_T ON CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID; Outer Join Example (Microsoft Syntax) Unlike INNER join, this will include customer rows with no matching order rows LEFT OUTER JOIN syntax with ON causes customer data to appear even if there is no corresponding order data
  • 316.
    Results Unlike INNERjoin, this will include customer rows with no matching order rows
  • 317.
    Assemble all informationnecessary to create an invoice for order number 1006 SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_NAME, CUSTOMER_ADDRESS, CITY, SATE, POSTAL_CODE, ORDER_T.ORDER_ID, ORDER_DATE, QUANTITY, PRODUCT_DESCRIPTION, STANDARD_PRICE, (QUANTITY * UNIT_PRICE) FROM CUSTOMER_T, ORDER_T, ORDER_LINE_T, PRODUCT_T WHERE CUSTOMER_T.CUSTOMER_ID = ORDER_LINE.CUSTOMER_ID AND ORDER_T.ORDER_ID = ORDER_LINE_T.ORDER_ID AND ORDER_LINE_T.PRODUCT_ID = PRODUCT_PRODUCT_ID AND ORDER_T.ORDER_ID = 1006; Multiple Table Join Example Four tables involved in this join Each pair of tables requires an equality-check condition in the WHERE clause, matching primary keys against foreign keys
  • 318.
    Figure 8-2 Resultsfrom a four-table join From CUSTOMER_T table From ORDER_T table From PRODUCT_T table
  • 319.
    Processing Multiple Tables Using Subqueries Subquery–placing an inner query (SELECT statement) inside an outer query Options: In a condition of the WHERE clause As a “table” of the FROM clause Within the HAVING clause Subqueries can be: Noncorrelated–executed once for the entire outer query Correlated–executed once for each row returned by the outer query
  • 320.
    Show all customerswho have placed an order SELECT CUSTOMER_NAME FROM CUSTOMER_T WHERE CUSTOMER_ID IN (SELECT DISTINCT CUSTOMER_ID FROM ORDER_T); Subquery Example Subquery is embedded in parentheses. In this case it returns a list that will be used in the WHERE clause of the outer query The IN operator will test to see if the CUSTOMER_ID value of a row is included in the list returned from the subquery
  • 321.
    Correlated vs. NoncorrelatedSubqueries Noncorrelated subqueries: Do not depend on data from the outer query Execute once for the entire outer query Correlated subqueries: Make use of data from the outer query Execute once for each row of the outer query Can use the EXISTS operator
  • 322.
    Figure 8-3a Processinga noncorrelated subquery No reference to data in outer query, so subquery executes once only These are the only customers that have IDs in the ORDER_T table The subquery executes and returns the customer IDs from the ORDER_T table The outer query on the results of the subquery
  • 323.
    Show all ordersthat include furniture finished in natural ash SELECT DISTINCT ORDER_ID FROM ORDER_LINE_T WHERE EXISTS (SELECT * FROM PRODUCT_T WHERE PRODUCT_ID = ORDER_LINE_T.PRODUCT_ID AND PRODUCT_FINISH = ‘Natural ash’); Correlated Subquery Example The subquery is testing for a value that comes from the outer query The EXISTS operator will return a TRUE value if the subquery resulted in a non-empty set, otherwise it returns a FALSE
  • 324.
    Figure 8-3b Processinga correlated subquery Subquery refers to outer-query data, so executes once for each row of outer query Note: only the orders that involve products with Natural Ash will be included in the final results
  • 325.
    Show all productswhose standard price is higher than the average price SELECT PRODUCT_DESCRIPTION, STANDARD_PRICE, AVGPRICE FROM (SELECT AVG(STANDARD_PRICE) AVGPRICE FROM PRODUCT_T), PRODUCT_T WHERE STANDARD_PRICE > AVG_PRICE; Another Subquery Example The WHERE clause normally cannot include aggregate functions, but because the aggregate is performed in the subquery its result can be used in the outer query’s WHERE clause One column of the subquery is an aggregate function that has an alias name. That alias can then be referred to in the outer query Subquery forms the derived table used in the FROM clause of the outer query
  • 326.
    Union Queries Combinethe output (union of multiple queries) together into a single result table First query Second query Combine
  • 327.
    Conditional Expressions UsingCase Syntax This is available with newer versions of SQL, previously not part of the standard
  • 328.
    Ensuring Transaction IntegrityTransaction = A discrete unit of work that must be completely processed or not processed at all May involve multiple updates If any update fails, then all other updates must be cancelled SQL commands for transactions BEGIN TRANSACTION/END TRANSACTION Marks boundaries of a transaction COMMIT Makes all updates permanent ROLLBACK Cancels updates since the last COMMIT
  • 329.
    Figure 8-5 AnSQL Transaction sequence (in pseudocode)
  • 330.
    Data Dictionary FacilitiesSystem tables that store metadata Users usually can view some of these tables Users are restricted from updating them Some examples in Oracle 10g DBA_TABLES–descriptions of tables DBA_CONSTRAINTS–description of constraints DBA_USERS–information about the users of the system Examples in Microsoft SQL Server 2000 SYSCOLUMNS–table and column definitions SYSDEPENDS–object dependencies based on foreign keys SYSPERMISSIONS–access permissions granted to users
  • 331.
    SQL:1999 and SQL:2003Enhancements/Extensions User-defined data types (UDT) Subclasses of standard types or an object type Analytical functions (for OLAP) CEILING, FLOOR, SQRT, RANK, DENSE_RANK WINDOW–improved numerical analysis capabilities New Data Types BIGINT, MULTISET (collection), XML CREATE TABLE LIKE–create a new table similar to an existing one MERGE
  • 332.
    Persistent Stored Modules(SQL/PSM) Capability to create and drop code modules New statements: CASE, IF, LOOP, FOR, WHILE, etc. Makes SQL into a procedural language Oracle has propriety version called PL/SQL, and Microsoft SQL Server has Transact/SQL SQL:1999 and SQL:2003 Enhancements/Extensions (cont.)
  • 333.
    Routines and TriggersRoutines Program modules that execute on demand Functions –routines that return values and take input parameters Procedures –routines that do not return values and can take input or output parameters Triggers Routines that execute in response to a database event (INSERT, UPDATE, or DELETE)
  • 334.
    Figure 8-6 Triggerscontrasted with stored procedures Procedures are called explicitly Triggers are event-driven Source : adapted from Mullins, 1995.
  • 335.
    Figure 8-7 Simplifiedtrigger syntax, SQL:2003 Figure 8-8 Create routine syntax, SQL:2003
  • 336.
    Embedded and DynamicSQL Embedded SQL Including hard-coded SQL statements in a program written in another language such as C or Java Dynamic SQL Ability for an application program to generate SQL code on the fly, as the application is running
  • 337.
    Chapter 9: The Client/Server Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 338.
    Objectives Definition ofterms List advantages of client/server architecture Explain three application components: presentation, processing, and storage Suggest partitioning possibilities Distinguish between file server, database server, 3-tier, and n-tier approaches Describe and discuss middleware Explain database linking via ODBC and JDBC
  • 339.
    Client/Server Systems Networkedcomputing model Processes distributed between clients and servers Client–Workstation (usually a PC) that requests and uses a service Server–Computer (PC/mini/mainframe) that provides a service For DBMS, server is a database server
  • 340.
    Application Logic inC/S Systems GUI Interface Procedures, functions, programs DBMS activities Processing Logic I/O processing Business rules Data management Storage Logic Data storage/retrieval Presentation Logic Input–keyboard/mouse Output–monitor/printer
  • 341.
    Client/Server Architectures FileServer Architecture Database Server Architecture Three-tier Architecture Client does extensive processing Client does little processing
  • 342.
    File Server ArchitectureAll processing is done at the PC that requested the data Entire files are transferred from the server to the client for processing Problems: Huge amount of data transfer on the network Each client must contain full DBMS Heavy resource demand on clients Client DBMSs must recognize shared locks, integrity checks, etc. FAT CLIENT
  • 343.
    Figure 9-2 FileServer Architecture FAT CLIENT
  • 344.
    Two-Tier Database ServerArchitectures Client is responsible for I/O processing logic Some business rules logic Server performs all data storage and access processing  DBMS is only on server
  • 345.
    Advantages of Two-TierApproach Clients do not have to be as powerful Greatly reduces data traffic on the network Improved data integrity since it is all processed centrally Stored procedures  DBMS code that performs some business rules done on server
  • 346.
    Advantages of Stored Procedures Compiled SQL statements Reduced network traffic Improved security Improved data integrity Thinner clients
  • 347.
    Figure 9-3 Two-tierdatabase server architecture Thinner clients DBMS only on server
  • 348.
    Three-Tier Architectures ThinClient PC just for user interface and a little application processing. Limited or no data storage (sometimes no hard drive) GUI interface (I/O processing) Browser Business rules Web Server Data storage DBMS Client Application server Database server
  • 349.
    Figure 9-4 Three-tierarchitecture Thinnest clients Business rules on separate server DBMS only on DB server
  • 350.
    Advantages of Three-TierArchitectures Scalability Technological flexibility Long-term cost reduction Better match of systems to business needs Improved customer service Competitive advantage Reduced risk
  • 351.
    Application Partitioning Placingportions of the application code in different locations (client vs. server) AFTER it is written Advantages Improved performance Improved interoperability Balanced workloads
  • 352.
    Common Logic DistributionsFigure 9-5a Two-tier client-server environment Figure 9-5b n -tier client-server environment Processing logic could be at client, server, or both Processing logic will be at application server or Web server
  • 353.
    Role of theMainframe Mission-critical legacy systems have tended to remain on mainframes Distributed client/server systems tend to be used for smaller, workgroup systems Difficulties in moving mission critical systems from mainframe to distributed Determining which code belongs on server vs. client Identifying potential conflicts with code from other applications Ensuring sufficient resources exist for anticipated load Rule of thumb Mainframe for centralized data that does not need to be moved Client for data requiring frequent user access, complex graphics, and user interface
  • 354.
    Middleware Software thatallows an application to interoperate with other software No need for programmer/user to understand internal processing Accomplished via Application Program Interface (API) The “glue” that holds client/server applications together
  • 355.
    Types of MiddlewareRemote Procedure Calls (RPC) client makes calls to procedures running on remote computers synchronous and asynchronous Message-Oriented Middleware (MOM) asynchronous calls between the client via message queues Publish/Subscribe push technology  server sends information to client when available Object Request Broker (ORB) object-oriented management of communications between clients and servers SQL-oriented Data Access middleware between applications and database servers
  • 356.
    Database Middleware ODBC–Open Database Connectivity Most DB vendors support this OLE-DB Microsoft enhancement of ODBC JDBC –Java Database Connectivity Special Java classes that allow Java applications/applets to connect to databases
  • 357.
    Client/Server Security Networkenvironment  complex security issues Security levels: System-level password security for allowing access to the system Database-level password security for determining access privileges to tables; read/update/insert/delete privileges Secure client/server communication via encryption
  • 358.
    Keys to SuccessfulClient-Server Implementation Accurate business problem analysis Detailed architecture analysis Architecture analysis before choosing tools Appropriate scalability Appropriate placement of services Network analysis Awareness of hidden costs Establish client/server security
  • 359.
    Benefits of Movingto Client/Server Architecture Staged delivery of functionality speeds deployment GUI interfaces ease application use Flexibility and scalability facilitates business process reengineering Reduced network traffic due to increased processing at data source Facilitation of Web-enabled applications
  • 360.
    Using ODBC toLink External Databases Stored on a Database Server Open Database Connectivity (ODBC) API provides a common language for application programs to access and process SQL databases independent of the particular RDBMS that is accessed Required parameters: ODBC driver Back-end server name Database name User id and password Additional information: Data source name (DSN) Windows client computer name Client application program’s executable name Java Database Connectivity (JDBC) is similar to ODBC–built specifically for Java applications
  • 361.
    ODBC Architecture (Figure 9-6) Each DBMS has its own ODBC-compliant driver Client does not need to know anything about the DBMS Application Program Interface (API) provides common interface to all DBMSs
  • 362.
    Chapter 10: The Internet Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 363.
    Objectives Definition ofterms Explain the importance of attaching a database to a Web page Describe necessary environment for Internet and Intranet database connectivity Use Internet terminology appropriately Explain the purpose of WWW Consortium Explain the purpose of server-side extensions Describe Web services Compare Web server interfaces (CGI, API, Java servlets) Decribe Web load balancing methods Explain plug-ins Explain the purpose of XML as a standard
  • 364.
    Web Characterstics thatSupport Web-Based Database Applications Web browsers are simple to use Information transfer can take place across different platforms Development time and cost have been reduced Sites can be static (no database) or dynamic/interactive (with database) Potential e-business advantages (improved customer service, faster market time, better supply chain management)
  • 365.
    Figure 10-1 Database-enabledintranet/internet environment
  • 366.
    Internet and IntranetServices Web server Database-enabled services Directory, security, authentication E-mail File Transfer Protocol (FTP) Firewalls and proxy servers News or discussion groups Document search Load balancing and caching
  • 367.
    World Wide WebConsortium (W3C) An international consortium of companies working to develop open standards that foster the development of Web conventions so that Web documents can be consistently displayed on all platforms See www.w3c.org
  • 368.
    Web-Related Terms WorldWide Web (WWW) The total set of interlinked hypertext documents residing on Web servers worldwide Browser Software that displays HTML documents and allows users to access files and software related to HTML documents Web Server Software that responds to requests from browsers and transmits HTML documents to browsers Web pages–HTML documents Static Web pages–content established at development time Dynamic Web pages–content dynamically generated, usually by obtaining data from database
  • 369.
    Communications Technology IPAddress Four numbers that identify a node on the Internet e.g. 131.247.152.18 Hypertext Transfer Protocol (HTTP) Communication protocol used to transfer pages from Web server to browser HTTPS is a more secure version Uniform Resource Locator (URL) Mnemonic Web address corresponding with IP address Also includes folder location and html file name Typical URL
  • 370.
    Internet-Related Languages HypertextMarkup Language (HTML) Markup language specifically for Web pages Standard Generalized Markup Language (SGML) Markup language standard Extensible Markup Language (XML) Markup language allowing customized tags XHTML XML-compliant extension of HTML Java Object-oriented programming language for applets JavaScript/VBScript Scripting languages that enable interactivity in HTML documents Cascading Style Sheets (CSS) Control appearance of Web elements in an HML document XSL and XSLT XMS style sheet and transformation to HTML Standards and Web conventions established by World Wide Web Consortium (W3C)
  • 371.
    XML Overview Becomingthe standard for E-Commerce data exchange A markup language (like HTML) Uses elements, tags, attributes Includes document type declarations (DTDs), XML schemas, comments, and entity references XML Schema (XSD) replacing DTDs Relax NG–ISO standard XML database definition Document Structure Description (DSD)– expressive, easy to use XML database definition
  • 372.
    Sample XML SchemaSchema is a record definition, analogous to the Create SQL statement, and therefore provides metadata
  • 373.
    Sample XML DocumentData XML data involves elements and attributes defined in the schema, and is analogous to inserting a record into a database.
  • 374.
    Server-Side Extensions Programsthat interact directly with Web servers to handle requests e.g. database-request handling middleware Figure 10-2 Web-to-database middleware
  • 375.
    Web Server InterfacesCommon Gateway Interface (CGI) Specify transfer of information between Web server and CGI program Performance not very good Security risks Application Program Interface (API) More efficient than CGI Shared as dynamic link libraries (DLLs) Java Servlets Like applets, but stored at server Cross-platform compatible More efficient than CGI
  • 376.
    Web Servers ProvideHTTP service Passing plain text via TCP connection Serve many clients at once Therefore, multithreaded and multiprocessed Load balancing approaches: Domain Name Server (DNS) balancing One DNS = multiple IP addresses Software/hardware balancing Request at one IP address is distributed to multiple servers Reverse proxy Intercept client request and cache response
  • 377.
    Client-Side Extensions Addfunctionality to the browser Plug-ins Hardware/software modules that extend browser capabilities by adding features (e.g. encryption, animation, wireless access) ActiveX Microsoft COM/OLE components that allow data manipulation inside the browser Cookies Block of data stored at client by Web server for later use
  • 378.
    Components for DynamicWeb Sites DBMS–Oracle, Microsoft SQL Server, Informix, Sybase, DB2, Microsoft Access, MySQL Web server–Apache, Microsoft IIS Programming languages/development technologies–ASP .NET, PHP, ColdFusion, Coral Web Builder, Macromedia’s Dreamweaver Web browser–Microsoft Internet Explorer, Netscape Navigator, Mozilla Firefox, Apple’s Safari, Opera Text editor–Notepad, BBEdit, vi, or an IDE FTP capabilities–SmartFTP, WS_FTP
  • 379.
    Figure 10-3 DynamicWeb development environment
  • 380.
    Figure 10-4 SamplePHP script that accepts user registration input a) PHP script initiation and input validation (Ullman, PHP and MySql for Dynamic Web Sites, 2003, Script 6.6)
  • 381.
  • 382.
    Figure 10-4 SamplePHP script that accepts user registration input b) Adding user information to the database
  • 383.
    Figure 10-4 SamplePHP script that accepts user registration input c) Close PHP script and display HTML form
  • 384.
    Web Services XML-basedstandards that define protocols for automatic communication between applications over the Web. Web Service Components: Universal Description, Discovery, and Integration (UDDI) Technical specification for distributed registries of Web services and businesses open to communication on these services Web Services Description Language (WSDL) XML-based grammar for describing Web services and providing public interfaces for these services Simple Object Access Protocol (SOAP) XML-based communication protocol for sending messages between applications via the Internet Challenges for Web Services Lack of mature standards Lack of security
  • 385.
    Figure 10-5 Atypical order entry system that uses Web services (adapted from Newcomer 2002, Figure 1-3) Figure 10-6 Web services protocol stack
  • 386.
    Figure 10-7 Webservices deployment (adapted from Newcomer, 2002)
  • 387.
    Service Oriented ArchitecturesCollection of services that communicate with each other by passing data Web services, CORBA, Java, XML, SOAP, WSDL Loosely coupled Interoperable Using SOA results in increased software development efficiency (up to 40%)
  • 388.
    Semantic Web W3Cproject using Web metadata to automate collection of knowledge and storing in easily understood format Structuring based on: XML Resource Description Framewok (RDF) Web Ontology Language (OWL)
  • 389.
    Rapidly Accelerating InternetChanges Integrated database environments Use of cell phones and PDAs Changes in organizational relationships Globalization Challenges to IT personnel require: Business and technology infrastructure understanding Leadership and communication skills Upward influence techniques Employee management techniques
  • 390.
    Chapter 11: Data Warehousing Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 391.
    Objectives Definition ofterms Reasons for information gap between information needs and availability Reasons for need of data warehousing Describe three levels of data warehouse architectures List four steps of data reconciliation Describe two components of star schema Estimate fact table size Design a data mart
  • 392.
    Definition Data Warehouse: A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented: e.g. customers, patients, students, products Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources Time-variant: Can study trends and changes Nonupdatable: Read-only, periodically refreshed Data Mart : A data warehouse that is limited in scope
  • 393.
    Need for DataWarehousing Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational systems and data (for improved performance)
  • 394.
    Source : adaptedfrom Strange (1997).
  • 395.
    Data Warehouse ArchitecturesGeneric Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture All involve some form of extraction , transformation and loading ( ETL )
  • 396.
    Figure 11-2: Generictwo-level data warehousing architecture E T L One, company-wide warehouse Periodic extraction  data is not completely current in warehouse
  • 397.
    Figure 11-3 Independentdata mart data warehousing architecture Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts
  • 398.
    Figure 11-4 Dependentdata mart with operational data store: a three-level architecture E T L Single ETL for enterprise data warehouse (EDW) Simpler data access ODS provides option for obtaining current data Dependent data marts loaded from EDW
  • 399.
    Figure 11-5 Logicaldata mart and real time warehouse architecture E T L Near real-time ETL for Data Warehouse ODS and data warehouse are one and the same Data marts are NOT separate databases, but logical views of the data warehouse  Easier to create new data marts
  • 400.
    Figure 11-6 Three-layerdata architecture for a data warehouse
  • 401.
    Data Characteristics Statusvs. Event Data Event = a database action (create/update/delete) that results from a transaction Figure 11-7 Example of DBMS log entry Status Status
  • 402.
    Data Characteristics Transientvs. Periodic Data With transient data, changes to existing records are written over previous records, thus destroying the previous data content Figure 11-8 Transient operational data
  • 403.
    Periodic data arenever physically altered or deleted once they have been added to the store Data Characteristics Transient vs. Periodic Data Figure 11-9: Periodic warehouse data
  • 404.
    Other Data WarehouseChanges New descriptive attributes New business activity attributes New classes of descriptive attributes Descriptive attributes become more refined Descriptive data are related to one another New source of data
  • 405.
    The Reconciled DataLayer Typical operational data is: Transient–not historical Not normalized (perhaps due to denormalization for performance) Restricted in scope–not comprehensive Sometimes poor quality–inconsistencies and errors After ETL, data should be: Detailed–not summarized yet Historical–periodic Normalized–3 rd normal form or higher Comprehensive–enterprise-wide perspective Timely–data should be current enough to assist decision-making Quality controlled–accurate with full integrity
  • 406.
    The ETL ProcessCapture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load
  • 407.
    Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Figure 11-10: Steps in data reconciliation
  • 408.
    Scrub/Cleanse…uses pattern recognitionand AI techniques to upgrade data quality Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data Figure 11-10: Steps in data reconciliation (cont.)
  • 409.
    Transform = convertdata from format of operational system to format of data warehouse Record-level: Selection –data partitioning Joining –data combining Aggregation –data summarization Field-level: single-field –from one field to one field multi-field –from many fields to one, or one field to many Figure 11-10: Steps in data reconciliation (cont.)
  • 410.
    Load/Index= place transformeddata into the warehouse and create indexes Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse Figure 11-10: Steps in data reconciliation (cont.)
  • 411.
    Figure 11-11: Single-fieldtransformation In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup –another approach, uses a separate table keyed by source record code
  • 412.
    Figure 11-12: Multifieldtransformation M:1–from many source fields to one target field 1:M–from one source field to many target fields
  • 413.
    Derived Data ObjectivesEase of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities Characteristics Detailed (mostly periodic) data Aggregate (for summary) Distributed (to departmental servers) Most common data model = star schema (also called “dimensional model”)
  • 414.
    Figure 11-13 Componentsof a star schema Fact tables contain factual or quantitative data Dimension tables contain descriptions about the subjects of the business 1:N relationship between dimension tables and fact tables Excellent for ad-hoc queries, but bad for online transaction processing Dimension tables are denormalized to maximize performance
  • 415.
    Figure 11-14 Starschema example Fact table provides statistics for sales broken down by product, period and store dimensions
  • 416.
    Figure 11-15 Starschema with sample data
  • 417.
    Issues Regarding StarSchema Dimension table keys must be surrogate (non-intelligent and non-business related), because: Keys may change over time Length/format consistency Granularity of Fact Table–what level of detail do you want? Transactional grain–finest level Aggregated grain–more summarized Finer grains  better market basket analysis capability Finer grain  more dimension tables, more rows in fact table Duration of the database–how much history should be kept? Natural duration–13 months or 5 quarters Financial institutions may need longer duration Older data is more difficult to source and cleanse
  • 418.
    Figure 11-16: Modelingdates Fact tables contain time-period data  Date dimensions are important
  • 419.
    The User InterfaceMetadata (data catalog) Identify subjects of the data mart Identify dimensions and facts Indicate how data is derived from enterprise data warehouses, including derivation rules Indicate how data is derived from operational data store, including derivation rules Identify available reports and predefined queries Identify data analysis techniques (e.g. drill-down) Identify responsible people
  • 420.
    On-Line Analytical Processing(OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing –come up with 2-D view of data Drill-down –going from summary to more detailed views
  • 421.
  • 422.
    Figure 11-24 Example of drill-down Summary report Drill-down with color added Starting with summary data, users can obtain details for particular cells
  • 423.
    Data Mining andVisualization Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis
  • 424.
    Chapter 12: Data and Database Administration Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice Hall
  • 425.
    Objectives Definition ofterms List functions and roles of data/database administration Describe role of data dictionaries and information repositories Compare optimistic and pessimistic concurrency control Describe problems and techniques for data security Describe problems and techniques for data recovery Describe database tuning issues and list areas where changes can be done to tune the database Describe importance and measures of data quality Describe importance and measures of data availability
  • 426.
    Traditional Administration DefinitionsData Administration : A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards Database Administration : A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery
  • 427.
    Traditional Data AdministrationFunctions Data policies, procedures, standards Planning Data conflict (ownership) resolution Managing the information repository Internal marketing of DA concepts
  • 428.
    Traditional Database AdministrationFunctions Selection of DBMS and software tools Installing/upgrading DBMS Tuning database performance Improving query processing performance Managing data security, privacy, and integrity Data backup and recovery
  • 429.
    Evolving Approaches toData Administration Blend data and database administration into one role Fast-track development – monitoring development process (analysis, design, implementation, maintenance) Procedural DBAs–managing quality of triggers and stored procedures eDBA–managing Internet-enabled database applications PDA DBA–data synchronization and personal database management Data warehouse administration
  • 430.
    Data Warehouse AdministrationNew role, coming with the growth in data warehouses Similar to DA/DBA roles Emphasis on integration and coordination of metadata/data across many data sources Specific roles: Support DSS applications Manage data warehouse growth Establish service level agreements regarding data warehouses and data marts
  • 431.
    Open Source DBMSsAn alternative to proprietary packages such as Oracle, Microsoft SQL Server, or Microsoft Access mySQL is an example of open-source DBMS Less expensive than proprietary packages Source code available, for modification
  • 432.
    Figure 12-2 Datamodeling responsibilities
  • 433.
    Database Security DatabaseSecurity: Protection of the data against accidental or intentional loss, destruction, or misuse Increased difficulty due to Internet access and client/server technologies
  • 434.
    Figure 12-3 Possiblelocations of data security threats
  • 435.
    Threats to DataSecurity Accidental losses attributable to: Human error Software failure Hardware failure Theft and fraud Improper data access: Loss of privacy (personal data) Loss of confidentiality (corporate data) Loss of data integrity Loss of availability (through, e.g. sabotage)
  • 436.
    Figure 12-4 EstablishingInternet Security
  • 437.
    Web Security StaticHTML files are easy to secure Standard database access controls Place Web files in protected directories on server Dynamic pages are harder Control of CGI scripts User authentication Session security SSL for encryption Restrict number of users and open ports Remove unnecessary programs
  • 438.
    W3C Web PrivacyStandard Platform for Privacy Protection (P3P) Addresses the following: Who collects data What data is collected and for what purpose Who is data shared with Can users control access to their data How are disputes resolved Policies for retaining data Where are policies kept and how can they be accessed
  • 439.
    Database Software SecurityFeatures Views or subschemas Integrity controls Authorization rules User-defined procedures Encryption Authentication schemes Backup, journalizing, and checkpointing
  • 440.
    Views and IntegrityControls Views Subset of the database that is presented to one or more users User can be given access privilege to view without allowing access privilege to underlying tables Integrity Controls Protect data from unauthorized use Domains–set allowable values Assertions–enforce database conditions
  • 441.
    Authorization Rules Controlsincorporated in the data management system  Restrict: access to data actions that people can take on data  Authorization matrix for: Subjects Objects Actions Constraints
  • 442.
  • 443.
    Some DBMSs alsoprovide capabilities for user-defined procedures to customize the authorization process Figure 12-6a Authorization table for subjects (salespeople) Figure 12-6b Authorization table for objects (orders) Figure 12-7 Oracle privileges Implementing authorization rules
  • 444.
    Encryption –the coding or scrambling of data so that humans cannot read them Secure Sockets Layer (SSL) is a popular encryption scheme for TCP/IP connections Figure 12-8 Basic two-key encryption
  • 445.
    Authentication Schemes Goal– obtain a positive identification of the user Passwords: First line of defense Should be at least 8 characters long Should combine alphabetic and numeric data Should not be complete words or personal information Should be changed frequently
  • 446.
    Authentication Schemes (cont.)Strong Authentication Passwords are flawed: Users share them with each other They get written down, could be copied Automatic logon scripts remove need to explicitly type them in Unencrypted passwords travel the Internet Possible solutions: Two factor–e.g. smart card plus PIN Three factor–e.g. smart card, biometric, PIN Biometric devices–use of fingerprints, retinal scans, etc. for positive ID Third-party mediated authentication–using secret keys, digital certificates
  • 447.
    Security Policies andProcedures Personnel controls Hiring practices, employee monitoring, security training Physical access controls Equipment locking, check-out procedures, screen placement Maintenance controls Maintenance agreements, access to source code, quality and availability standards Data privacy controls Adherence to privacy legislation, access rules
  • 448.
    Database Recovery Mechanismfor restoring a database quickly and accurately after loss or damage Recovery facilities: Backup Facilities Journalizing Facilities Checkpoint Facility Recovery Manager
  • 449.
    Back-up Facilities Automaticdump facility that produces backup copy of the entire database Periodic backup (e.g. nightly, weekly) Cold backup–database is shut down during backup Hot backup–selected portion is shut down and backed up at a given time Backups stored in secure, off-site location
  • 450.
    Journalizing Facilities Audittrail of transactions and database updates Transaction log–record of essential data for each transaction processed against the database Database change log–images of updated data Before-image–copy before modification After-image–copy after modification Produces an audit trail
  • 451.
    Figure 12-9 Databaseaudit trail From the backup and logs, databases can be restored in case of damage or loss
  • 452.
    Checkpoint Facilities DBMSperiodically refuses to accept new transactions  system is in a quiet state Database and transaction logs are synchronized This allows recovery manager to resume processing from short period, instead of repeating entire day
  • 453.
    Recovery and RestartProcedures Disk Mirroring–switch between identical copies of databases Restore/Rerun–reprocess transactions against the backup Transaction Integrity–commit or abort all transaction changes Backward Recovery (Rollback)–apply before images Forward Recovery (Roll Forward)–apply after images (preferable to restore/rerun)
  • 454.
    Transaction ACID PropertiesAtomic Transaction cannot be subdivided Consistent Constraints don’t change from before transaction to after transaction Isolated Database changes not revealed to users until after transaction has completed Durable Database changes are permanent
  • 455.
    Figure 12-10 Basicrecovery techniques a) Rollback
  • 456.
    Figure 12-10 Basicrecovery techniques (cont.) b) Rollforward
  • 457.
    Database Failure ResponsesAborted transactions Preferred recovery: rollback Alternative: Rollforward to state just prior to abort Incorrect data Preferred recovery: rollback Alternative 1: rerun transactions not including inaccurate data updates Alternative 2: compensating transactions System failure (database intact) Preferred recovery: switch to duplicate database Alternative 1: rollback Alternative 2: restart from checkpoint Database destruction Preferred recovery: switch to duplicate database Alternative 1: rollforward Alternative 2: reprocess transactions
  • 458.
    Concurrency Control Problem–in a multiuser environment, simultaneous access to data can result in interference and data loss Solution – Concurrency Control The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment
  • 459.
    Figure 12-11 Lost update (no concurrency control in effect) Simultaneous access causes updates to cancel each other A similar problem is the inconsistent read problem
  • 460.
    Concurrency Control TechniquesSerializability Finish one transaction before starting another Locking Mechanisms The most common way of achieving serialization Data that is retrieved for the purpose of updating is locked for the updater No other user can perform update until unlocked
  • 461.
    Figure 12-12: Updateswith locking (concurrency control) This prevents the lost update problem
  • 462.
    Locking Mechanisms Lockinglevel: Database–used during database updates Table–used for bulk updates Block or page–very commonly used Record–only requested row; fairly commonly used Field–requires significant overhead; impractical Types of locks: Shared lock–Read but no update permitted. Used when just reading to prevent another user from placing an exclusive lock on the record Exclusive lock–No access permitted. Used when preparing to update
  • 463.
    Deadlock An impassethat results when two or more transactions have locked common resources, and each waits for the other to unlock their resources Figure 12-13 The problem of deadlock John and Marsha will wait forever for each other to release their locked resources!
  • 464.
    Managing Deadlock Deadlockprevention: Lock all records required at the beginning of a transaction Two-phase locking protocol Growing phase Shrinking phase May be difficult to determine all needed resources in advance Deadlock Resolution: Allow deadlocks to occur Mechanisms for detecting and breaking them Resource usage matrix
  • 465.
    Versioning Optimistic approachto concurrency control Instead of locking Assumption is that simultaneous updates will be infrequent Each transaction can attempt an update as it wishes The system will reject an update when it senses a conflict Use of rollback and commit for this
  • 466.
    Figure 12-15 Theuse of versioning Better performance than locking
  • 467.
    Managing Data QualityCauses of poor data quality External data sources Redundant data storage Lack of organizational commitment Data quality improvement Perform data quality audit Establish data stewardship program (data steward is a liaison between IT and business units) Apply total quality management (TQM) practices Overcome organizational barriers Apply modern DBMS technology Estimate return on investment
  • 468.
    Data Dictionaries andRepositories Data dictionary Documents data elements of a database System catalog System-created database that describes all database objects Information Repository Stores metadata describing data and data processing resources Information Repository Dictionary System (IRDS) Software tool managing/controlling access to information repository
  • 469.
    Figure 12-16 Threecomponents of the repository system architecture A schema of the repository information Software that manages the repository objects Where repository objects are stored Source : adapted from Bernstein, 1996.
  • 470.
    Database Performance TuningDBMS Installation Setting installation parameters Memory Usage Set cache levels Choose background processes Input/Output (I/O) Contention Use striping Distribution of heavily accessed files CPU Usage Monitor CPU load Application tuning Modification of SQL code in applications
  • 471.
    Data Availability Downtimeis expensive How to ensure availability Hardware failures–provide redundancy for fault tolerance Loss of data–database mirroring Maintenance downtime–automated and nondisruptive maintenance utilities Network problems–careful traffic monitoring, firewalls, and routers