Database Modeling


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Database Modeling

  1. 1. CHAPTER 3: DATABASE MODELINGChapter Objectives At the end of this chapter, you should be able to :  distinguish between unary, binary and ternary relationship;  draw an entity-relationship (E-R) diagram to represent common business;  model ISA relationships in an E-R diagram;  draw an object-oriented data model (OODM) to represent common business situations;  know the limited or concerns regarding OODBMS;  transform E-R to relational.Essential Readings  Modern Database Management (4th Edition), Fred R. Mcfadden and Jeffrey A. Hiffer (1994), Bejamin/Cummings. [Chapters 4,5,14 and 15] Planning Enterprise data model Analysis Conceptual data model Design Logical Database Design Logical data model Physical Database Design Technology model Implementation Data base and repositories Summary and the database development processProf. Erwin M. Globio, MSIT 3-1
  2. 2. DB212 CHAPTER 3: DATABASE MODELING3.1 Entity-relationship Model The E-R model is used to construct a conceptual data model, which is a representation of the structure of a database that is independent of the software that will be used to implement the database. 3.1.1 Entities  An entity can be a person, place, object, event or concept in the user environment about which the organization wishes to maintain data. Examples: Person: EMPLOYEE, STUDENT, PATIENT Place: STATE, REGION, COUNTRY Object: MACHINE, BUILDING Event: SALE. REGISTRATION Concept: ACCOUNT, COURSE  There is a difference between entity type and entity instance.  Entity type is a collection of entities that share common properties or characteristics.  Entity instance is a single occurrence of an entity type. Examples: Entity type: EMPLOYEE Attributes: EMPONO. NAME ADDRESS YR HIRED Instances: 100 101 Roy Lim Mary Wong 100 Chalet Lane S(0211) Blk 321 Toa Payoh Lor 1 S(1231) 1989 1990 3.1.2 Attributes  An attributes is a property or characteristic of an entity that is of interest to the organization. Both entity and relationships may have attributes. STUDENT: STUDENT NO.,NAME, ADDRESS EMPLOYEE: EMPLOYEE NO., NAME, SKILL  Every entity type must have an attribute or set of attributes that uniquely identifies each instance and distinguishes that instance from the other instances of the same entity type.  Candidate key is an attribute (or combination of attributes) that uniquely identifies each instance of an entity type.3-2 Prof. Erwin M. Globio, MSIT
  3. 3. DB212 CHAPTER 3: DATABASE MODELING  A primary key is a candidate key that has been selected as the identifier for an entity type. Think about: What are the criteria for selecting primary key?  Multivalued attribute This refer to attributes that can have more than one value for each entity instance. For example an EMPLOYEE may possess a number of skills. So SKILL is a multivalued attributes. Name Name Employee No. Skill name EMPLOYEE Note: This is during the first pass of conceptual design which is common to use a double-line ellipse to highlight multivalued attributes. However, subsequently, we will normalize the entity data by removing this multivalued attributes as shown: Name Name Skill name Employee No. EMPLOYEE HAS SKILL 3.1.3 Relationship Relationships are what that holds together the various entities.  Degree of a relationship The degree of a relationship is the number of entity types that participate in that relationship.  Unary relationship (recursive relationship) It is a relationship between the instances of one entity type.Prof. Erwin M. Globio, MSIT 3-3
  4. 4. DB212 CHAPTER 3: DATABASE MODELING Examples: Is married PERSON to EMPLOYEE Manages 1 1 1 M One-to-one One-to-many Has ITEM components N M Many-to-many  Binary relationship It is a relationship between instances of two entity types. Examples : PRODUCT Is PRODUCT One-to-one LINE LINE assigned PRODUCT PRODUCT One-to-many Contains LINE LINE PRODUCT Registers PRODUCT Many-to-many LINE for LINE  Ternary relationship It is a simultaneous relationship among instances of three entity types. Ternary relationshipIt is a simultaneous relationship among instances of three entity types.3-4 Prof. Erwin M. Globio, MSIT
  5. 5. DB212 CHAPTER 3: DATABASE MODELING Example : PART VENDOR Ships WAREHOUSE  Existence Dependency An instance of one entity cannot exist without the existence of an instance of some other entity.  Weak Entity An entity type that has an existence dependency.  Identifying Relationship A relationship in which the primary key of the parent entity is used as part of the primary key of the dependent entity. Weak entities usually do not have a natural identifier. Instead ,the primary key of the parent entity is often used as part of the dependent child entity. Example : Student Student Student Parent No. Name No. Name STUDENT Has PARENT Therefore, data integrity is enforced as weak entity cannot exist unless the parent entity exists. 3.1.4 Generalization Business entitles are often best modeled using the concepts of generalization and categorization.  Generalization is the concept that some things(entities)are subtypes of other, more general things. For Example: Accountant, Programmer Analyst are subtypes of the more general type called STAFF.  Categorization is when an entity comes in various subtypes. For example : There are different subtypes of employee which are hourly employee, salaried employee and part-time employee.Prof. Erwin M. Globio, MSIT 3-5
  6. 6. DB212 CHAPTER 3: DATABASE MODELING  Supertypes A generic entity type that is subdivided into subtypes. For example, in figure 3.1 EMPLOYEE is a supertype.  Subtypes A subset of a supertype that shares common attributes or relationships distinct from other subsets. The relationship between each subtype and supertype is called an ISA relationship. Usually, the subtypes are mutually exclusive and that one i.e. required for each instance of the supertype. For example in figure 3.1, each employee must be an hourly employee, a salaried employee or a part-time employee. Attributes that are peculiar to each subtype are included with that subtype only(e.g. HOURLY RATE is peculiar to HOURLY EMPLOYEE). This exclusive relationship is represented with a curved line(as shown in figure 3.1)  Inheritance Inheritance is the property that, when entity types or object classes are arranged in a hierarchy, each entity type of object class assumes the attributes and methods of its ancestors. For example, in the above example, the attributes NAME, ADDRESS and DATA HIRED are inherited by three employee subtypes. Only attributes that are unique to a subtype are associated with that subtype.3-6 Prof. Erwin M. Globio, MSIT
  7. 7. DB212 CHAPTER 3: DATABASE MODELING Employee Name Date Address Hired EMPLOYEE ISA ISA ISA Daily Rate HOURLY EMPLOYEE EMPLOYEE CONSULTANT Daily Daily Daily Daily Rate Rate Rate Rate Employee Employee Employee no. no. no. Figure 3-13.2 Logical Database Design There is a process of transforming the conceptual data model into a logical database model. There are four types of logical database models in use today: object-oriented, hierarchical, network and relational. 3.2.1 Object-Oriented Data Model Most future database management systems will be based on objects, or will incorporate object-oriented functionality. This enable users to create generic, all purpose components that can be reused in multiple applications.  Core conceptsProf. Erwin M. Globio, MSIT 3-7
  8. 8. DB212 CHAPTER 3: DATABASE MODELING  Objects Objects are abstraction of the real world entities that exhibit states and behaviours. The state of objects are expressed as values of the attributes of the object. The behaviour of an object is expressed by a set of methods that operate on its attributes.  Attributes These are the properties of objects that are of interest to the organization.  Methods Methods define the behaviour of an object. Methods can only process data within the object class in which they are defined (concept of encapsulation). However, they can receive requests from methods in another object class. There are a number of method categories:  Occur methods - instance add, instance change, instance delete.  Calculate methods - perform calculations on the data values encapsulated in the same object class.  Monitor methods - produce signals when predetermined limits are exceeded in a system.  Encapsulation This is the property that the attributes and the methods of an object are hidden from the outside world and do not have to be known to access its data values or use its methods. Each object has an interface that is known to the outside world. An outside agent (can be another object) may request that a method be performed by sending a message to the object.  Object classes and instance A logical grouping of objects that have the same attributes and behaviour. An object instance is one occurrence of an object class. When we use the term object by itself, we are referring to an object class. For example: VEHICLE is an example of object class and LORRY, CAR, MOTORCYCLE are examples of object instances. VEHICLE Number Year Model Add Vehicle  Identifying and describing objects  Top-down approach This begins with a high level description of the environment and proceeds from the general to specific. In the object-oriented data model, methods are activated by sending messages from a sending object to a receiving object. For example studying written material and talking with users are activities to locate nouns in written material. This provides information about potential objects.3-8 Prof. Erwin M. Globio, MSIT
  9. 9. DB212 CHAPTER 3: DATABASE MODELING  Bottom-up approach Bottom-up approach begins with system detail, examples reports, video forms and other detail documents and displays. The analyst identifies the candidate objects and their properties. In reality, the top-down and bottom-up approaches should be used to identify and describe candidates objects.  Generalization Object-oriented data model show generalization and specialization specification of real-world entities. To express generalization relationships, object are arranged into a hierarchy. For example, an organization has three basic types of employees: hour employee, salaried employees and contract consultants. The following are notations used to represent generalization.  Inheritance Inheritance is an important principle of the object-oriented mode. Inheritance means that all properties of an object class become the properties of its subclasses. For example, the attributes of EMPLOYEE apply to all three subclasses. The method CalculateAge applies to all employees. EMPLOYEE Employee No. Name Address Date hired Birthdate CalculateAge HOURLY SALARIED CONSULTANT Hourly rate Annual salary Contact no. Stock option Date hired Calculate MthlyWage Calculate StockBenefit Allocate To Contact The object-oriented approach can be a basis for several database management systems.  Advantages  Reusability Objects can be defined for a variety of functions and then reused in numerous applications.  Complex data types An object-oriented database can store and manage complex data such as documents, graphics, images, voice message and video sequences.Prof. Erwin M. Globio, MSIT 3-9
  10. 10. DB212 CHAPTER 3: DATABASE MODELING  Distributed databases Due to the communication mode denoted between objects (sending of messages to activate methods in an object), object-oriented databases can support distribution of data across a network more easily than other data models.  Illustration Build an E-R diagram and object-printed diagram for following scenario: Customer orders arrive daily. If the customer order is from a new customer, the salesperson enters information to add that customer to the database. Customer information include customer no., name, address, city, credit limit and total owned. There is also a computation of total owned for each update by the system. The user then update the order files according. For each order, there will be an order no. and order date. For each line item on the order, the system then prompts the user to enter the product number and quantity ordered. As each product is added to the order, the system consults the quantity on hand for that item and computes the quantity that can be shipped. The product number, description and quantity shipped are then added to a shipping notice for the order (The quantity-shipped is calculated by the system). The system also computes the extended amount (quantity, shipped times unit price) for each item on the order and adds it to the total owed and adds it to total owed for that customer. If the total owned exceeds the customers credit limit, a message is sent to the user.3 - 10 Prof. Erwin M. Globio, MSIT
  11. 11. DB212 CHAPTER 3: DATABASE MODELING Name Address Customer City No. Total owned Credit limit CUSTOMER Places Order no. ORDER Order date Qty-shipped Qty-ordered Consists Product no. Unit price PRODUCT Description Qty on hand Figure 3-2: E–R diagramProf. Erwin M. Globio, MSIT 3 - 11
  12. 12. DB212 CHAPTER 3: DATABASE MODELING ORDER Order no. Order date Product no. Quantity ordered Quantity shipped Calculate QtyShipped CUSTOMER CONSULTANT Customer no. Product no. Name Description Address Unit price City Quantity on hand Credit limit Total owned Calculate MthlyWage Calculate QtyOnHand Figure 3-3: Object-oriented diagram 3.2.2 Hierarchical Data Model The hierarchical database model was the first important logical database model. Today it is primarily on implemented mainframe. In this model, records are arranged in a top-down structure that resembles an upside-down tree. The parent and child are often used in describing hierarchical model. An important characteristic is that a child is related to one parent. The leading hierarchical DBMS in use today is IBMs Information Management System (IMS).  IMS Physical Databases The physical database record is a basic building block in IMS. A physical database record (PDBR) consists of a set of related fields. A PDBR consists of a root segment and its subordinate segments called child segments. Example: IMS physical database record DEPARTMENT DEPTNO DNAME LOCATIONEQUIPMENT EMPLOYEE IDENT COST NUMBER EMPNO ENAME YEARS DEPENDENT SKILL DNAME AGE CODE SNAME NOYEARS3 - 12 Prof. Erwin M. Globio, MSIT
  13. 13. DB212 CHAPTER 3: DATABASE MODELING  PDBR Occurrences 0001 Accounting A 0002 Personnel B 0003 Engineering C EQUIPMENT EMPLOYEE IBM PC 3500 3 100 Mary 3 102 Thomas 2 Apple II 2500 2 DEPENDENT SKILL Theresa 34 02 Programming 2 Paul 8 Mike 5 DEPENDENT SKILL John 10 01 Accounting 3 David 14 In this case, the Engineering DEPARTMENT consists of a set of the EQUIPMENT occurrences, a set of EMPLOYEE occurrences. Under each EMPLOYEE occurrences, there are a set of DEPENDENT occurrences and a set of SKILL occurrences.  IMS Logical Database External views of individual users in IMS are reflected in logical database records (LDBRs). Each LDBR type is a subset of a corresponding PDBR type. Any segment type (except the root segment) of a PDBR may be omitted in the corresponding LDBR. Examples of logical database records : (a) Equipment LDBR (b) Personnel LDBR DEPARTMENT DEPARTMENT DEPTNO DNAME LOCATION DEPTNO DNAME LOCATION EQUIPMENT EMPLOYEE IDENT COST NUMBER EMPNO ENAME YEARS SKILL CODE SNAME NOYEARSProf. Erwin M. Globio, MSIT 3 - 13
  14. 14. DB212 CHAPTER 3: DATABASE MODELING (Notice that each LDBR types contain the root segment, DEPARTMENT. In IMS, program communication block (PCB) is a series of statements which define a logical database record. Each of these LDBR type represents the view of a different user.) 3.2.3 Network Data Model In the network database model, there is no distinction between parent and child record types. Any record types may be associated with both the EMPLOYEE and PROJECT record types. DEPARTMENT DEPARTMENT DEPARTMENT DEPARTMENT DEPARTMENT Note: The Conference on Data System Languages (CODASYL) through its Data Base Task Group (DBTG) is a standard organization that has developed and issued description of language for defining and processing data in Network DBMS. IDMS (Integrated Database Management System) is the leading DBTG DBMS on IBM computers. In Network Data Model a set is the usual means employed in a DBTG database to represent a relationship. A set is the definition of a directed relationship from an owner record type to one or more member record types. In figure 3.3, DEPT-EMP, DEPT-PROJ and PROJ-EMP are examples of set. This set defines a 1:M or 1:1 relationship. Generally, we can assume that set is implemented as a ring data structures with the owner at the head of the chain and with the last member pointing to the owner. DEPARTMENT DEPTNO DNAME LOCATION DEPT-EMP DEPT-PROJ EMPLOYEE PROJECT EMPNO YEARS ENAME PROJNO DESCRIPTION PROJ-EMP Figure 3-43 - 14 Prof. Erwin M. Globio, MSIT
  15. 15. DB212 CHAPTER 3: DATABASE MODELING 3.2.4 Relational Data Model A data model that represents data in the form of tables or relation. The relational database model consists of the following three components: 1. Data structure Data are organized in the form of tables or relation. 2. Data manipulation Powerful operations such as SQL languages or Query-by-example, are used to manipulate data stored in the database. 3. Data integrity Business rules are specified to maintain the integrity of data when they are manipulated.  Physical Properties A relation consists of 1 or more columns and 0 or more rows. In the relational model, a row is called a tuple. Each relation is given a unique name, and each column has a name unique within the relation. Each row contains an instance of the data associated with the relation. A relation with no rows is empty (contain no data), but still exists. COLUMN NAMES a b c d x1 x2 x3 – – – – – – – xn Figure 3-5: Diagrammatic representation of a relation  Logical Properties Ordering of columns Columns are unordered, left to right. This property is designed to preserve the independence of each column.  Ordering of rows Rows are unordered, top to bottom. This is designed to preserve the independence of each row.  Uniqueness No row may be duplicated in a given relation. Uniqueness in a relation is guaranteed by the designation of a primary key for each relation. A candidate key in a relation is an attribute that uniquely identifies in row in that relation. A primary key is a candidate key that has been selected to be the unique identifier for each row. Primary key values cannot be null, since they would then not identify a row.Prof. Erwin M. Globio, MSIT 3 - 15
  16. 16. DB212 CHAPTER 3: DATABASE MODELING  The sequence of columns (Left to right) is significant The columns of a relation can be interchanged without changing the meaning or use of the relation. There is no hidden meaning implied by the ordering.  The sequence of rows (Top to bottom) is significant The rows of a relation may be interchanged or stored in any sequences. Thus it makes no differences as whether to insert a new row in front or at the end of the table.3.3 Comparison of Data Representation Concepts 3.3.1 Relational vs Network Models In relational model connections between two relations are represented by including two attributes with the same domain -- one in each of relations. Example: ITEM SUPPLIER Item No Description Supplier-no. Supplier-no. Supplier name 100 XXX A123 A123 HUP 200 YYY A124 A124 CHONG Individual tuples that have the same value for that attributes are logically related, even though they are not physically connected together.In this case, supplier-no is a foreign key in the table ITEM and a primary key in the SUPPLIER table. Thus a logical link is established. In network model, 1:N connections between two record types are explicitly represented by the set type construct. The DBMS connects related records together in a set instance by some physical method. Records are physical connected together when they participate in the same set instance. Hence a set type physically represents a logical 1:N relationship type. Example: ITEM Item-no. Description Supplier-no. 100 XXX A123 200 YYY A124 ITEM– 300 ZZZ A123 SUPPLIER SUPPLIER Supplier-no.. Supplier name A123 HUP A124 CHONG In this case supplier-no A123 is physically connected together by the DBMS as participants in a set instance of the ITEM-SUPPLIER set type. In addition, we can keep logical connection among the records by duplicating the key field of the owner record in the member records. This fields values can be used as an automatic set selection or as a validation checking. Therefore, the relational model is simpler.3 - 16 Prof. Erwin M. Globio, MSIT
  17. 17. DB212 CHAPTER 3: DATABASE MODELING 3.3.2 Hierarchical vs Network models Both represent relationship explicity. However a reoccurs type in the network model can be a member in any number of set types. In hierarchical, a record type can have one real parent. This creates problems when modeling M:N and n-ary relationship types. Thus, if a schema contains mainly 1:N relationship types in the same direction, it can be modeled naturally as a hierarchy. However, if many relationship types exists, we will have to duplicate records and pointers to design a hierarchical representation. Therefore, the hierarchical model is considered inferior to both the relational and network models as far as modeling capability is concerned. 3.3.3 Object-oriented Models Object-oriented data model has a closer representation of real-world problem domains and has a greater productivity in applications productivity. It has ability to model complex data types such as images and documents. However ODBMS technology is still very young. Currently, its limitations and concerns are :  Lack of accepted standards There are no initials standard at the national and international level yet.  Lack of development tools Tools such as CASE and 4GL are still under development, hence but not widely available.  Performance The performance of ODBMS technology with large numbers of concurrent users and frequent transactions has yet been tested or demonstrated.  Data management facilities Some of the products do not have adequate for concurrency control, backup and recovery.  Query languages Users cannot retrieve data about one or more objects based on his own defined criteria.Prof. Erwin M. Globio, MSIT 3 - 17
  18. 18. DB212 CHAPTER 3: DATABASE MODELING3.4 Review Questions 1. Draw an E-R and OO diagram for each of the following situations: a. A company has a number of employees. The attributes of EMPLOYEE include NAME, ADDRESS, BIRTHDATE and DATEHIRED. One method that is required of all employees is claculateYesrsOfSevice. The company also has several projects. Attributes of PROJECT include CODE, DESCRIPTION and START DATE. Each employee may be assigned to one or more projects, or maybe assigned to a project. A project must have at least one employee assigned, and may not be assigned to a project. A project must have at least one employee assigned, and may have several employee assigned. One method required of all projects is CalculateTotalCostToDate. b. In a vehicle-licensing application, there are three types of vehicle: passenger, truck, and trailer. Vehicle ID is an attribute of all vehicle types. Truck and trailer vehicles have an attribute named GROSS CAPACITY. The passenger and truck vehicle types required of all courses is ChangeCourseDescription. 2. Compare and contrast the following: a. inheritance vs generalization hierarchy b. generalization vs specialization c. candidate key vs primary key d. subtype vs super type e. physical database record vs logical database record f. encapsulation vs inheritance g. relational model vs network model h. hierarchical model vs network model 3. What are the limitations of ODBMS technology?Useful Websites to learn Database and Programming: - 18 Prof. Erwin M. Globio, MSIT