Database Design E R 2009
Upcoming SlideShare
Loading in...5
×
 

Database Design E R 2009

on

  • 2,434 views

 

Statistics

Views

Total Views
2,434
Views on SlideShare
2,431
Embed Views
3

Actions

Likes
0
Downloads
147
Comments
0

2 Embeds 3

http://www.slideshare.net 2
https://dudley.blackboard.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Database Design E R 2009 Database Design E R 2009 Presentation Transcript

  • Database Design 1
  • What is a Database?  A collection of data that is organised in a predictable structured way  Any organised collection of data in one place can be considered a database  Examples  filing cabinet  library  floppy disk 2
  • What is Data?  The heart of the DBMS.  Two kinds  Collection of information that is stored in the database.  A Metadata, information about the database. Also known as a data dictionary.  An example of a Metadata in shown in Appendix A. 3
  • Relational Data Model  A relational database is perceived as a collection of tables.  Each table consists of a series of rows & columns.  Tables (or relations) are related to each other by sharing a common characteristic. (EG a customer or product table)  A table yields complete physical data independence. 4
  • Features of the relational data model  Logical and Physical separated  Simple to understand. Easy to use.  Powerful nonprocedural (what, not how) language to access data.  Uniform access to all data.  Rigorous database design principles.  Access paths by matching data values, not by following fixed links. 5
  • Terminology  Relation  Relational Database Null Value Relational Database Schema    Tuple  Attribute  Referential Integrity Constraint  Domain  Foreign Key  Relation Schema  Network Diagram  Integrity Constraint Update Operations Domain Constraint    Key Constraint  Join  Key, Candidate Key  Projection  Simple Key  Lossless join  Composite Key  Primary Key 6
  • Terminology  Relation  A 2-dimensional table of values with these properties:  No duplicate rows  Rows can be in any order  Columns are uniquely named by Attributes  Each cell contains only one value Employee Job Manager Jack Secretary Jill Jill Executive Bozo Bozo Director Lulu Clerk Jill The special value is NULL which implies that there is no corresponding value for that cell. This may mean the value does not apply or that it is unavailable. Entire rows of NULLs are not allowed. 7
  • Terminology Tuple  Commonly referred to as a row in a relation. Eg: Jack Clerk Jill Attribute • A name given to a column in a relation. Each column must have a unique attribute. This are often referred to as the fields. Employee Job Manager 8
  • Terminology: Domain  A pool of atomic values from which cells a given column  take their values. Each attribute has a domain.  Attributes may share domains Tom Mary Attribute Domain Bozo Kali........ Employee Person Name Typist Manager Job Job Name Clerk........ Manager Person Name Here again we use the same domain as above in employee. An attribute value (a value in a column labelled by the attribute) must be from the corresponding domain or may be NULL ( ). 9
  • Terminology:Relation Schema A Relational Schema is a named set of attributes. This refers to the structure only of a relation. It is derived from the traditional set notation displayed below EMPLOYEE = { Employee, Job, Manager } This is usually written in the modified version for database purposes: EMPLOYEE( Employee, Job, Manager ) referring to the Table EMPLOYEE Employee Job Manager 10
  • Terminology:Integrity Constraint and Domain Constraint An Integrity Constraint is a condition that prescribes what values are allowable in a relation. This permits the restriction of the type of value that can be placed in a particular cell. Eg. only numbers for telephone numbers The Domain Constraint is a condition on the allowable values for an attribute. e.g. Salary < $60,000 Employee Job Manager Salary Jack Secretary Jill 25,000 This restricts the EMPLOYEE salary to be under Jill Executive Bozo 40,000 a set value. Bozo Director 50,000 Lulu Clerk Jill 30,000 11
  • Dealing with many keys A key is a device that helps define relationships. Its role is based on the concept of functional dependency which we deal with extensively. We will be referring to the following keys  Primary key  Foreign key  Simple key  Composite key  Concatenated key  Candidate key  Universal key 12
  • Terminology:Key Constraint  A condition that no value of an attribute or set of attributes be repeated in a relation. e.g. Employee(the attribute) has only unique values in EMPLOYEE (the relation).  The following relation violates this constraint: EMPLOYEE Employee Job Manager Salary Jack appears twice. Jack Secretary Bozo 25,000 This means that Jack Secretary Jill 25,000 This violates the Jill Executive Bozo 40,000 Key Constraint Bozo Director 50,000 Lulu Clerk Jill 30,000 13
  • Terminology:Key Constraint An attribute (or set of attributes) to which a key constraint applies is called a key ( or candidate key). Every relation schema must have a key. EMPLOYEE Another possible key. Employee Job Manager Salary The combination of Job and manager is Jack Secretary Bozo 25,000 also unique Key Kim Secretary Jill 25,000 Jill Executive Bozo 40,000 Bozo Director Bozo 50,000 Lulu Clerk Jill 30,000 Simple Key Composite Key: If a key constraint applies to a set of attributes, it is called a composite or Concatenated Key. Otherwise it is a simple key. 14
  • Terminology:Key Constraint A key cannot have a NULL ( ) value. For example, If we change the table so that the Employee Bozo does not have a manager then Job+Manager cannot be a key. Employee Job Manager Salary Jack Secretary Bozo 25,000 Kim Secretary Jill 25,000 Jill Executive Bozo 40,000 Bozo Director 50,000 Lulu Clerk Jill 30,000 15
  • Terminology:Key Constraint  A primary key is a special preassigned key that can always be used to uniquely identify tuples. We have to choose a Primary Key for every Relation. We must consider all of the Candidate Keys and choose between them.  Employee is a primary key for EMPLOYEE is usually written as: EMPLOYEE( Employee, Job, Manager, Salary ) Employee Job Manager Salary Here we have chosen Jack Secretary Bozo 25,000 the Simple Key Employee Over the concatenated Kim Secretary Jill 25,000 option of both Jill Executive Bozo 40,000 Job and Manager Bozo Director Bozo 50,000 Lulu Clerk Jill 30,000 16
  • A Database is more than multiple tables you must be able to “relate” them Cus-code Cus-Name Area-Code Phone Agent-Code 10010 Ramus 615 844-2573 502 10011 Dunne 713 894-1238 501 10012 Smith 615 894-2205 502 10013 Olowaski 615 894-2180 502 10014 Orlando 615 222-1672 501 10015 O’Brian 713 442-3381 503 10016 Brown 615 297-1226 502 10017 Williams 615 290-2556 503 10018 Farris 713 382-7185 501 10019 Smith 615 297-3809 503 The link is through the Agent-Code Agent-Code Agent-Name Agent-AreaCode Agent-Phone 501 Alby 713 226-1249 502 Hahn 615 882-1244 503 Okon 615 123-5589 17
  • Terminology: Relational Database A Relational Database is just a set of Relations. For example EMPLOYEE Employee Job Manager Salary Jack Secretary Bozo 25,000 Kim Secretary Jill 25,000 Jill Executive Bozo 40,000 Bozo Director 50,000 Lulu Clerk Jill 30,000 JOB Job Salary Secretary 25,000 Which Attribute do you think Secretary 25,000 relates these two tables Executive 40,000 together? Director 50,000 Clerk 30,000 18
  • Terminology:Relational Database Schema A Relational Database Schema a set of Relation Schemas, together with a set of Integrity Constraints. For example the Relations that you have been looking at with the headings EMPLOYEE Employee Job Manager Salary JOB Job Salary are usually written as EMPLOYEE(Employee, Job, Manager) JOB(Job, Salary) Notice how the Primary Keys are underlined 19
  • Terminology :Referential Integrity Constraint This constraint says that – All the values in one column should also appear in another column. Look at the table below. Every entry in the Job column of the Employee table must appear in the Job column of the Job table EMPLOYEE FK PK JOB Employee Job Manager Job Salary Jack Secretary Bozo Secretary 25,000 Kim Secretary Jill Secretary 25,000 Jill Executive Bozo Executive 40,000 Bozo Director Director 50,000 Lulu Clerk Jill Clerk 30,000 PK FK 20
  • Referential Integrity Constraint Why does the following relational database violate the referential integrity constraints? EMPLOYEE FK PK JOB Employee Job Manager Job Salary Jack Secretary Bozo Director 50,000 Kim Secretary Jill Clerk 30,000 Bozo Director Lulu Clerk Jill PK FK In other words, Why can’t Employee(Job) be a Foreign Key to Job(Job), or Employee(Manager) be a Foreignfor the answers Click here Key to Employee(Employee)? 21
  • Why Use Relational Databases  Their major advantage is they minimise the need to store the same data in a number of places  This is referred to as data redundancy 22
  • Example of Data Redundancy (1) 23
  • Example of Data Redundancy (2)  The names and addresses of all students are being maintained in three places  If Owen Money moves house, his address needs to be updated in three separate places  Consider what might happen if he forgot to let library administration know 24
  • Example of Data Redundancy (3) 25
  • Example of Data Redundancy (4)  Data redundancy results in:  wastage of storage space by recording duplicate information  difficulty in updating information  inaccurate, out-of-date data being maintained 26
  • Other Advantages of Relational Databases  Flexibility  relationships (links) are not implicitly defined by the data  Data structures are easily modified  Data can be added, deleted, modified or queried easily 27
  • Summary of Some Common Relational Terms  Entity - an object (person, place or thing) that we wish to store data about  Relationship - an association between two entities  Relation - a table of data  Tuple - a row of data in a table  Attribute - a column of data in a table  Primary Key - an attribute (or group of attributes) that uniquely identify individual records in a table  Foreign Key - an attribute appearing within a table that is a primary key in another table 28
  • Network Diagrams 29
  • Terminology: Network Diagram Referential Integrity constraints can easily be represented by arrows FK PK. The arrow points from the Foreign Key to the matching Primary Key EMPLOYEE(Employee, Job, Manager) JOB(Job, Salary) A relational database schema with referential integrity constraints can also be represented by a network diagram. A Referential Integrity Constraint is notated as an arrow labeled by the foreign key. You must always write the label of the Foreign Key on the arrow. Sometimes the same attribute has different titles in different tables. EMPLOYEE Job JOB Manager Network Diagram Notice here, the label is Manager and not Employee. 30
  • Personnel Database: Consider the following Tables PRIOR_JOB EXPERTISE E_NUMBER PRIOR_TITLE E_NUMBER SKILL ASSIGNMENT SKILL 1001 Junior consultant 1001 Stock market E_NUMBER P_NUMBER AREA 1001 Research analyst 1001 Investments 1002 Junior consultant 1002 Stock market 1001 26713 Stock Market 1002 Research analyst 1003 Stock market 1002 26713 Taxation 1003 Junior consultant 1003 Investments 1003 23760 Investments 1004 Summer intern 1004 Taxation 1003 26511 Management 1005 Management 1004 26511 PROJECT 1004 28765 1005 23760 NAME P_NUMBER MANAGER ACTUAL_COST EXPECTED_COST New billing system 23760 Yates 1000 10000 Common stock issue 28765 Baker 3000 4000 Resolve bad debts 26713 Kanter 2000 1500 New office lease 26511 Yates 5000 5000 Revise documentation 34054 Kanter 100 3000 Entertain new client 87108 Yates 5000 2000 New TV commercial 85005 Baker 10000 8000 EMPLOYEE TITLE NAME E_NUMBER DEPARTMENT E_NUMBER CURRENT_TITLE Kanter 1111 Finance 1001 Senior consultant Yates 1112 Accounting 1002 Senior consultant Adams 1001 Finance 1003 Senior consultant Baker 1002 Finance 1004 Junior consultant Clarke 1003 Accounting 1005 Junior consultant Dexter 1004 Finance 31 Early 1005 Accounting
  • Personnel Database Schema What are the connecting Foreign Keys to Primary Keys? Not FK, we will look at this later PROJECT (NAME, P_NUMBER, MANAGER, ACTUAL_COST, EXPECTED_COST )  ASSIGNMENT (E_NUMBER, P_NUMBER) SKILL (AREA)  PRIOR_JOB (E_NUMBER, PRIOR_TITLE)  EXPERTISE (E_NUMBER, SKILL)  TITLE (E_NUMBER, CURRENT TITLE ) EMPLOYEE (NAME, E_NUMBER, DEPARTMENT) 32
  • Personnel Database Network Diagram SKILL EMPLOYEE PROJECT Once you have produced your Schema and identified the Primary and Foreign Keys you can create the Network Diagram.The Network Diagram shows each of the tables with their links. Each of the Tables (Relations) are represented in a rectangle as shown. They are then connected by arrows that show the FKs pointing to the PKs, The arrow head points towards the PK, while the FK name written is the same as the attribute of the table that has the FK in it. EXPERTISE PRIOR_JOB TITLE ASSIGNMENT 33
  • Personnel Database Network Diagram SKILL EMPLOYEE PROJECT EXPERTISE PRIOR_JOB TITLE ASSIGNMENT 34
  • Summary: Questions  What is a Relational Database?  What actually is a relation?  What are Constraints?  What is a Schema?  What is a Network Diagram and why is it used? 35
  • Summary: Answers  A relational database is based on the relational data model. It is one or more Relations(Tables) that are Related to each other  A relation is a table composed of rows (tuples) and columns, satisfying 5 properties • No duplicate rows • Rows can be in any order • Columns are uniquely named by Attributes • Each cell contains only one value • No null rows.  Constraints are central to the correct modeling of business information. Here we have seen them limit the set up of your tables: Referential Constraint  The Network Diagram is used to navigate complex database structures. It is a compact way to show the relationships between Relations (Tables) 36
  • Activities  Consider the following relational database schemas. Suppliers(suppId, name, street, city,state) Part(partId,partName,weight,length,composition) Products(prodId, prodName,department) Supplies(partId,suppId) Uses(partId,prodId)  Make reasonable assumptions about the meaning of attribute and relations, identify the primary and foreign keys and draw a network diagram showing the relations and foreign keys. 37
  • Answer Supplier Part Product Supplies Uses 38
  •  Show the foreign keys on the network diagrams Orders Ordnum ordDate custNumb 12489 2/9/91 124 Customer custNumb custName Address Balance credLim sksnumb 124 Adams 48 oak st 418.68 500 3 SalesRep Slsnumber Name address totCom commRate 3 Mary 12 Way 2150 .05 Part Part Desc onHand IT wehsNumb unitPrice AX12 Iron 1.4 HW 3 17.95 39
  • OrLine ordNum Part ordNum quotePrice 40
  • Answer SalesRep Part SlsNumber Part Customer OrLine CustNumb orLine Orders 41
  •  Obtain tutorial 1 from your tutor 42
  • Functional Dependence FDD 43
  • Functional Dependency Diagrams Data Analysis In this Unit we look at the following: Data Element, Attribute, Functional Dependency (FD), Redundant FD, Pseudotransitive FD, Intersecting Attribute 44
  • Functional Dependency Diagrams A FUNCTIONAL DEPENDENCY DIAGRAM is a way of representing the structure of information needed to support a business or organization It can easily be converted into a design for a relational database to support the operations of the business. 45
  • Functional Dependency Diagrams There are a number of methods for us to develop our database design from here. We could use the method of developing a large table with all attributes and breaking it down into smaller tables using what we refer to as Normalization by Decomposition (we look at this in detail later), or we could use Functional Dependency Diagrams to create a pictorial model of our database. 46
  • Data Analysis and Database Design Using Functional Dependency Diagrams 1. The steps of Data Analysis in FDD are 1.1 Look for Data Elements 1.2 Look for Functional Dependencies 1.3 Represent Functional Dependencies in a diagram 1.4 Eliminate Redundant Functional Dependencies 2. Data Design, after we have our final version of the FDD 2.1 Apply the Synthesis Algorithm 47
  • Starting points for drawing functional dependency diagrams To start the process of constructing our FDD we do the following:  We must Understand the data  We Examine forms, reports,data entry and output screens etc…  We Examine sample data  We consider Enterprise (business) rules  We examine narrative descriptions and conduct interviews.  We apply our Experiences/Practice and that of others 48
  • Enterprise Rules What are Enterprise Rules? An enterprise rule (in the context of data analysis) is a statement made by the enterprise (organisation, company, officer in charge etc.) which constrains data in some way. Functional dependencies are the most important type of constraint on data and are often expressed in the form of enterprise rules. e.g No two employees may have the same employee number. An order is made by only one customer An employee can belong to only one department at a time. 49
  • Drawing FDDs - Data Elements We often refer to Data Elements during the FDD process  A data element is a elementary piece of recorded information  Every data element has a unique name.  A data element is either a Label, e.g PersonName, Address, BulidingCode, or Measurement, e.g. Height, Age, Date  A data element must take values that can be written down. 50
  • Functional Dependency Diagrams Using the Method of Decomposition Given the Sample Data Tables Problem ONF Eliminate Repeating Groups OR, here is the same Attribute process using the FDD Universal & Functional Relation Dependencies approach 1NF Functional Eliminate Dependency Part Key Diagram Now we have the Dependencies Database Design 2NF Relation Method of 3NF Eliminate Non Key Synthesis Relation Dependencies 51
  • Data Element Examples Here are some examples  PersonName has values Jeff, Jill, Gio, Enid  Address has values 1 John St, 25 Rocky Road  Height has values 171cm, 195cm  Age has values 21,52,93,2  Date has values 20th May 1947, 2nd March 1997  JobName has values Manager, Secretary, Clerk  Manager might not be a data element, but ManagerName could be. It could be a value of another data element e.g. JobName 52
  • Drawing FDDs Data Elements Start drawing the Functional Dependency Diagram by representing the Data Elements. A Data Element is represented by its name placed in a box: Data Element Every data element must have a unique name in the functional dependency diagram. A data element cannot be composed of other data elements i.e. it cannot be broken down into smaller components A Data Element is also known as an ATTRIBUTE, because it generally describes a property of some thing which we will later call an ENTITY 53
  • Drawing FDDs –Using Elements  A functional Dependency is a relationship between Attributes.  It is shown as an arrow e.g A B  It means that for every value of A, there is only one value for B  It reads “A determines B”.  A is called a determinant attribute.  B is called the dependent attribute. 54
  • Data Element Examples Here are some examples of finding the Data Elements on a typical form Surname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On a form gives rise to the element Surname CREDIT CARD Bankcard Mastercard Visa Other On a form gives rise to the element CreditCardType 55
  • Functional Dependency Examples Students and their family names “Each student (identified by student number) has only one family name” Students FamilyName 1 Smith 2 Jones 3 Smith 4 Andrews Considering the rules stated above we should be able to draw a FDD for this. What are the elements of interest? 56
  • FDDs Answer Students FamilyName 1 Smith 2 Jones 3 Smith 4 Andrews Data elements of interest are Student# and FamilyName. Students determine FamilyName (or FamilyName depends on Students) Students FamilyName Each student has exactly one family name, but the name could be the name of many students. So FamilyName does not determine Student# e.g. “Smith is the name of students 1 and 3 57
  • FDDs Examples Employees and the departments they work for. Department Name Accounting Department Name Sales Employee Number 11 Employee Number 45 2 27 31 Enterprise Rule: “Each employee works on only one department” In this example the tables are representing some interesting data of the business. We see that Employees with the ID numbers 11,2 and 31 all work in the Accounting Dept and that Employees with the ID numbers 45 and 27 work in the Sales Dept. Do you think that you could draw an FDD to represent this? Have a go and then check your answers 58
  • FDD Answers Employees and the departments they work for. Department Name Accounting Department Name Sales Employee Number 11 Employee Number 45 2 27 31 Data elements of interest are Employee# and DeptName” Employee# DeptName Employee# DeptName 11 Acc So we could make this following Table 2 Acc 45 Sales 31 Acc 27 Acc 59
  • FDDs Examples The quantity of parts held in a warehouse and their suppliers “Parts are uniquely identified by part numbers” “Suppliers are uniquely identified by Supplier Names” “A part is supplied by only one supplier” “A part is held in only one quantity” Parts Suppliers Name QOH 1 Wang Electronics 23 2 Cumberland Enterprises 80 3 Wang Electronics 4 4 Roscoe Pty. Ltd 58 Part# determines SupplierName & Part# determines QOH Parts SupplierName Parts QOH Should QOH be a determinant? No, common sense tells us that is not a reliable 60 choice. We could have had repeating values
  • FDDs Examples Students and their subjects enrolled. “Each student is given a unique student number” “A subject is uniquely identified by its name” “A student may choose several subjects” Student SubjectName Data element of interest are 1 History Student# and SubjectName 1 Geography Student 1 Mathematics 1 History 2 English SubjectName 2 English There us no functional dependency here. 3 Mathematics Student# does not determine 3 English SubjectName, 4 French nor does SubjectName determine Student# 4 Geography 61
  • FDDs Examples Results obtained by each student for each subject. “Each student is given a unique student number” “A subject is uniquely identified by its name” “A student may choose several subjects” “A student is allocated a result for each subject” “Each student has only one name.” Data elements are Student#, StudentName, SubjectName and Grade 62
  • FDDs Examples Results obtained by each student for each subject. Student Subject Student Grade Name Name 1 Smith History A 1 Smith Geography B 1 Smith Mathematics A 2 Jones History C 2 Jones English C 3 Smith English A 3 Smith Mathematics A 4 Andrews English D 4 Andrews French C 4 Andrews Geography C Try and construct an FDD for this table considering 63 the given Business Rules and the Data Elements
  • FDDs Examples Results obtained by each student for each subject. We can see that there is only one and only one student name for each student number, even though there might be more than one student with the same name. So…. Student # StudentName But the subject grade for any student cannot be determined by the subject name or the student# by itself. A student can have many grades depending on the subject. How can we cater for this? 64
  • FDDs Answer Results obtained by each student for each subject. We need to combine the two Elements to say that there is one and only one grade for a student doing a particular subject. Here then is the complete diagram StudentName Student SubjectName Grade This is called the Composite Determinant 65
  • FDDs Examples Customer Orders Order Part# CustomerName Address 454 12 David Smith 1 John St, Hawthorn 454 23 David Smith 1 John St, Hawthorn 455 32 Emily Jones 45 Grattan St, Parkville 455 49 Emily Jones 45 Grattan St, Parkville 455 54 Emily Jones 45 Grattan St, Parkville 456 12 Mary Ho 44 Park St, Hawthorn 456 54 Mary Ho 44 Park St, Hawthorn Validating functional dependencies Using simple data and populating the table, check there is only one value of the dependent. 66
  • FDDs Examples “Orders is uniquely identified by its names” “Customers are uniquely identified by their names” “A customer has only one address” “An order belongs to only one customer” “A part may be ordered only once one each order” Order Parts Ordered CustomerName Address 454 23, 12 David Smith 1 John St, Hawthorn 455 54, 49, 32 Emily Jones 45 Grattan St, Parkville 456 54, 12 Mary Ho 44 Park St, Hawthorn Order CustomerName Address Part# 67
  • FDDs Examples Employees and their tax files numbers “Each employee has a unique employee number” “Each employee has a unique tax file number ” Employee TaxFile# Employee# determines taxfile# 1 1024-5321 Employee# Taxfile# 2 3456-3294 3 8246-7106 Taxfile# determines Employee# 4 8861-6750 Taxfile# Employee# 5 1234-4765 Taxfile# Employee# Alternative keys 68
  •  Obtain Tutorial 2 from your tutor. 69
  • Functional Dependency Diagrams Database Design Let’s look at the process of converting the FDD into a schema. We have a 12 step process to do so, that has an iterative component to it (loop). The 12 steps are outlined in the next series of slides. 70
  • Functional Dependency Diagram Preparation 1. Represent each data element as a box. 2. Represent each functional dependency by an arrow. 3. Eliminate augmented dependencies. 4. Eliminate transitive dependencies. 5. Eliminate pseudo-transitive dependencies. By this stage, intersecting attributes should have been eliminated. 71
  • Deriving 3NF Schema: Synthesis Algorithm 6. Pick any (unmarked) arrow in the diagram. 7. Follow it back to its source, and write down the name of the source. S S 8. Follow all arrows from the source data item, and write down the names of their destinations. A S B S, A, B, C C S is now the key of a 3NF relation (S , A, B, C). 72
  • Synthesis Algorithm: Deriving 3NF Schema 9. Mark all the arrows just processed. A S B C 10. If there are any unmarked arrows in the diagram, go back to step 6. 11. Finally, determine the Universal Key. Any attribute which is not determined by any other attribute (ie. has no arrow going into it) is part of the Universal Key. U1 U2 U3 12. If the universal key is not already contained in any of the above relations, make it into a relation. The universal key is the key of the new relation. 73
  • A Fully Worked Example  We will now work from a given set of forms to produce an FDD then use the 12 steps to produce the Schema. The forms that follow show the time spent by a particular employee on a particular project. They contain details of the employee along with details of the project. In addition they also state the hours that the employee has spent on any one project to date. This is important to the FDD. Notice also that the employee can have many previous titles and have a number of skills. This also has to be dealt with in the FDD and then later after we have used the synthesis technique to create the Schema. Have a good look at the forms on the next 2 slides and try to develop the FDD yourself. 74
  • Personnel Database Forms 1 EMPLOYEE ______________________________________________________________________________________________________________ NAME E_NUMBER DEPARTMENT LOCATION CURRENT TITLE PRIOR_TITLES SKILLS_ ______________________________________________________________________________________________________________ Adams 1001 Finance 9th Floor Senior consultant Junior consultant Stock market Research analyst Investments ______________________________________________________________________________________________________________ PROJECTS ______________________________________________________________________________________________________________ NAME TIME_SPENT P_NUMBER MANAGER ACTUAL_COST EXPECTED_COST ______________________________________________________________________________________________________________ Resolve bad debts 35 26713 Kanter 2000 1500 ______________________________________________________________________________________________________________ We say that this table is in “zero normal form” (0NF) This is because the cells have multiple values, eg. Prior titles and Skills. The next slide shows forms that demonstrate that an employee can work on many projects. 75
  • Personnel Database Forms 2 EMPLOYEE __________________________________________________________________________________________________________ NAME E_NUMBER DEPARTMENT LOCATION CURRENT TITLE PRIOR_TITLES SKILLS __________________________________________________________________________________________________________ Baker 1002 Finance 9th Floor Senior consultant Junior consultant Stock market Research analyst _____________________________________________________________________________________________________________________ _ PROJECTS __________________________________________________________________________________________________________ NAME TIME_SPENT P_NUMBER MANAGER_NUM ACTUAL_COST EXPECTED_COST __________________________________________________________________________________________________________ Res bad debts 18 26713 Kanter 2000 1500 __________________________________________________________________________________________________________ ________________________________________________________________________________________________________________ EMPLOYEE _________________________________________________________________________________________________________ NAME E_NUMBER DEPARTMENT LOCATION CURRENT TITLE PRIOR_TITLES SKILLS _________________________________________________________________________________________________________ Clarke 1003 Accounting 8th Floor Senior consultant Junior consultant Stock market Investments _________________________________________________________________________________________________________ PROJECTS _________________________________________________________________________________________________________ NAME TIME_SPENT P_NUMBER MANAGER_NUM ACTUAL_COST EXPECTED_COST _________________________________________________________________________________________________________ New billing system 26 23760 Yates 1000 10000 New office lease 10 26511 Yates 5000 5000 ___________________________________________________________________________________________________________________________ 76
  • Personnel Database FD Diagram From the forms given we can produce the following FDD EXPECTED_COST PROJECT_NAME ACTUAL_COST TIME_SPENT MANAGER_NUM P_NUMBER EMPLOYEE_NAME PRIOR_TITLE E_NUMBER CURRENT_TITLE SKILL DEPARTMENT_NAME LOCATION 77
  • Personnel Database FD Diagram -Synthesis Let us just consider the section of the FDD that looks at the project number as the determinant EXPECTED_COST PROJECT_NAME ACTUAL_COST MANAGER_NUM P_NUMBER By using the synthesis method we can choose an arrow, trace it back to the source, and gather together all of the attributes that the source points to. Try this and see if you can create the schema for this table. 78
  • Personnel Database FD Diagram - Synthesis Again, if we choose another arrow that has not been chosen before and follow it back to the determinant we find DEPARTMENT_NAME is a determinant. Gathering all of the attributes that it points to we only have the location attribute. Hence this is a simple table consisting of DEPARTMENT_NAME as the Primary key and LOCATION as the only other attribute. DEPARTMENT_NAME LOCATION So the table DEPT(DEPARTMENT_NAME, LOCATION) is created 79
  • Personnel Database FD Diagram - Synthesis EMPLOYEE_NAME E_NUMBER CURRENT_TITLE Likewise for the section of the FDD based around the E_NUMBER, creating the following table for the Employees details. DEPARTMENT_NAME EMPLOYEE (EMPLOYEE_NAME, E_NUMBER, DEPARTMENT, CURRENT TITLE ) 80
  • Personnel Database FD Diagram - Synthesis Here we have a slightly more complicated one. The Time spent on the project is dependent on both the Project number and the Employee name, as it is the time spent by a particular employee on a particular project. This is demonstrated by the boxing of both the above attributes together pointing to the TIME_SPENT P_NUMBER TIME_SPENT E_NUMBER Try to create the Assignment table for this part of the FDD.When you think you have it have a look at ours and see if you are right. 81
  • Personnel Database FD Diagram - Synthesis P_NUMBER TIME_SPENT E_NUMBER The main difference here is that when choosing the arrow to follow back to the determinant we find that we have 2. This is OK, we just have to make sure that in the table both of them are the primary Key. We have a Composite Primary Key consisting P_NUMBER and E_NUMBER. When we then gather up all of the attributes that they point to together we get TIME_SPENT. Hence the table is written as ASSIGNMENT (E_NUMBER, P_NUMBER, TIME_SPENT) See the composite primary key 82
  • Personnel Database FD Diagram - Universal Key Now, the last part of the synthesis is often forgotten. We must collect up all of the attributes that do not have arrows pointing into them and place them in the one table called the Universal Key. Every attribute collected then becomes part of the composite Primary Key. In this case we have the following attributes inside the box below. Notice how Skill is there, as it sits by itself. Nothing is its determinant. P_NUMBER PRIOR_TITLE SKILL E_NUMBER UK (E_NUMBER, P_NUMBER, PRIOR_TITLE, SKILL) 83
  • Foreign Keys  In the Synthesis Algorithm, a foreign key will arise from any attribute that is: A. both a determinant and part of another determinant, OR B. both a determinant and a dependent. TIME_SPENT ASSIGNMENT (E_NUMBER, P_NUMBER, TIME_SPENT) A. P_NUMBER E_NUMBER EMPLOYEE (E_NUMBER, DEPARTMENT_NAME) B. DEPARTMENT_NAME LOCATION DEPT(DEPARTMENT_NAME, LOCATION) 84
  • ISA = Is A In the case of the manager we say that the manager number is contained within the employee number  Every MANAGER value is a E_NUMBER value. MANAGER_NUM ISA E_NUMBER MANAGER_NUM EMPLOYEE PROJECT  Gives rise to a new Foreign Key 85
  • Personnel Database Schema Generated by Synthesis PROJECT (NAME, P_NUMBER, MANAGER_NUM, ACTUAL_COST, EXPECTED_COST ) ASSIGNMENT (E_NUMBER, P_NUMBER, TIME_SPENT) This foreign key is a result of MANAGER ISA UK (E_NUMBER, P_NUMBER, PRIOR_TITLE, SKILL) E_NUMBER EMPLOYEE (NAME, E_NUMBER, DEPARTMENT, CURRENT TITLE ) DEPT(DEPARTMENT, LOCATION) 86
  • Personnel Database Network Diagram Generated by Synthesis DEPT DEPARTMENT_NAME MANAGER_NUM EMPLOYEE PROJECT E_NUMBER P_NUMBER ASSIGNMENT E_NUMBER + P_NUMBER UK 87
  • A Fully Worked Example We now have to take care of the multi-valued areas such as skills and prior titles. Our FDD synthesis takes care of everything up to that. It converts the FDD to what we call “Third normal Form”. We know that an individual can have many skills and many Prior Titles. They can also work on many Projects. Knowing the Employee number will not tell us one and only one value of the Skills that they have. We show this on the extended FDD with a double arrow notation.The notation for such a relationship is shown here where E_NUMBER is a determinant for many values of skill. Consequently the resulting representation shown on the next slide can be constructed, giving rise to the splitting of the UK to form three more relations E_NUMBER SKILL 88
  • Personnel Database Multivalued Dependency-Decomposition MultiValued Dependency ASSIGN (E_NUMBER, P_NUMBER, P_NUMBER) PRIOR_TITLE Employees are associated with MVDs Projects, Titles and Skills E_NUMBER independently. There is no direct relationship between SKILL Projects, Titles and Skills. PRIOR_JOB (E_NUMBER, PRIOR_TITLE) EXPERTISE (E_NUMBER, SKILL) Hence we have the three new relations ASSIGN, PRIOR_JOB and EXPERTISE 89
  • Personnel Database FD Diagram with MVDs and Inclusion PROJECT_NAME EXPECTED_COST MANAGER_NUM ACTUAL_COST P_NUMBER TIME_SPENT MVD ISA EMPLOYEE_NAME E_NUMBER CURRENT_TITLE PRIOR_TITL E MVD SKILL DEPARTMENT_NAME LOCATION 90
  • Final Personnel Database Schema PROJECT (NAME, P_NUMBER, MANAGER, ACTUAL_COST, EXPECTED_COST ) ASSIGNMENT (E_NUMBER, P_NUMBER, TIME_SPENT) Decomposed PRIOR_JOB (E_NUMBER, PRIOR_TITLE) from UK EXPERTISE (E_NUMBER, SKILL) EMPLOYEE (NAME, E_NUMBER, DEPARTMENT, CURRENT TITLE ) DEPT(DEPARTMENT, LOCATION) 91
  • Final Personnel Database Network Diagram DEPT DEPARTMENT_NAME MANAGER_NUM EMPLOYEE PROJECT E_NUMBER E_NUMBER E_NUMBER P_NUMBER EXPERTISE PRIOR_JOB ASSIGNMENT 92
  • Personnel Database FD Diagram - Synthesis EXPECTED_COST PROJECT_NAME ACTUAL_COST MANAGER P_NUMBER Choosing any of the arrows and following it back leads you to the project number (P_Number). This is then the Primary Key. If you then gather all of the attributes that P_Number points to and place them in the brackets you get the table Project with P_Number as the primary Key. PROJECT (PROJECT_NAME,P_NUMBER, MANAGER, ACTUAL_COST, EXPECTED_COST ) 93
  • Role Splitting In Functional Dependency Diagrams  In a Functional Dependency Diagram any group of attributes can be related in only one way.  For example, a pair of attributes can be related by an FD or not.  Sometimes data can be related in more one way.  For example, a department can have an employee as its head or as a member.  The member relationship is represented in the FDD: E_NUMBER DEPARTMENT_NAME  But the head relationship is represented in the FDD: DEPARTMENT_NAME E_NUMBER 94
  • Role Splitting In Functional Dependency Diagrams  We can choose to split the E_NUMBER attribute into E_NUMBER and HOD.  But the foreign key constraint that a Head of Department is an Employee is lost on the FDD. E_NUMBER DEPARTMENT_NAME FDD Synthesis HOD ISA NetworkD DEPARTMENT_NAME EMPLOYEE DEPT HOD 95
  • Role Splitting In FDDs  Alternatively, we can choose to split the DEPARTMENT_NAME attribute into EMPLOYING_DEPT and HEADED_DEPT.  But the foreign key constraint that an Employing Department must be a Headed Department is again lost on the FDD. E_NUMBER EMPLOYING_DEPT FDD Synthesis HEADED_DEPT ISA NetworkD EMPLOYING_DEPT EMPLOYEE DEPT E_NUMBER 96
  • Role Splitting Example Consider this example. We have the Employee with many Skills, Prior Titles, as before but we also have equipment that belongs to a particular employee, such as a computer and a fax. An employee can have many different pieces of equipment. It is worthwhile recognizing them on the diagram and then decomposing them into smaller relations as part of the schema 97
  • Suppose each item of equipment (identified by SERIAL#) belongs to an employee. SERIAL# DESCRIPTION PRIOR_TITL E MVDs EMPLOYEE_NAME SKILL E_NUMBER CURRENT_TITLE UK ISA HOD DEPARTMENT_NAME LOCATION •MVDs not necessarily embodied in the UK. •Better to decompose on MVDs first. •MVDs partition attributes into independent sets. 98
  •  Obtain Tutorial 3 from your tutor. 99
  • ENTITY RELATIONSHIP ANALYSIS In this area of the course we concentrate an another modelling technique called Entity Relationship Modelling (ERM or ER). The first stage of this process will look at the following: ER Data Model and Notation Strong Entities Discovering Entities, Attributes Identifying Entities Discovering Relationships 100
  • Critique of FD Analysis We originally concentrated on the modelling technique called Functional Dependency Diagrams. They have limitations as follows:  Disadvantages of FDD Does not represents real world objects, but only data; Cannot represent MVDs or specialization; Cannot represent multiple relationships without artificial splitting of attributes; Entities fragmented during analysis; 101
  • Conceptual Data Analysis By using the ER technique we have the following advantages:  Data Analysis from the User's Point of View  Models the Real World  Independent of Technology  Able to be validated in user terms 102
  • Entity Relationship Data Model Features The real value of using this type of modelling is that it considers the design in context to the environment where it comes from. We have these Entities that have there own identifying attributes, real things and real people. They can be observed in the environment. ERM has the following features:  Populations of Real World objects represented by Entities  Objects have Natural Identity  Entities have Attributes which have values  Entities related by Relationships  Constraints  Subtypes 103
  • Occurrences versus Entities 56 Jack Ackov 28 Jill Hill Let’s consider these two instances. Here we have both Jack and Jill, aged 56 and 23 respectively. By themselves they exist as people in their environment. In this case we consider them to be two customers. If we wish to model them and all of the possible customers that we have Entity Occurrences we need to create an Entity Class for Entity Instances all possibilities. Objects 104
  • Occurrences versus Entities 56 Jack Ackov 28 Jill Hill Customer# CustName CUSTOMER Entity Occurrences Entity Classes Entity Instances Entity Types Objects Entity Sets These are the Tuples of This will convert to the schema the table below below with Customer# being the Primary Key Customer# CustName 56 Jack Ackov CUSTOMER(Customer#, CustName) 28 Jill Hill 105
  • 56 Jack Ackov 28 Jill Hill Here we have Jack and Jill placing orders for particular items of stock. They appear to order different amounts of each. For instance Jack orders 3 bikes. Each item being ordered also has a Stock#, Price and 3 4 1 Description. These are 12 individual instances of the process so we need to be able to represent any possibility of this in our model. See how we do this on the next page. 156 Cup of Tea 234 Pussy Cat 106 23 50 Bike 1 25
  • 56 Jack Ackov 28 Jill Hill Customer# CustName CUSTOMER 3 4 1 12 ORDERS Quantity ITEM Stock# Price Desc 23 50 Bike 156 1 Cup of Tea 234 25 Pussy Cat 107
  • Occurrences to Entities to Schemas Customer# CustName CUSTOMER(Customer#, CustName) 56 Jack Ackov 28 Jill Hill Customer# Stock# Quantity ORDERS(Customer#, Stock#, Quantity) 56 23 3 56 156 12 28 156 4 28 234 1 Stock# Price Desc ITEM(Stock#, Price, Desc) 23 50 Bike 156 1 Cup of Tea 234 25 Pussy Cat 108
  • ENTITIES  Entities are classes of objects about which we wish to store information.  Examples are:  People: Employees, Customers, Students,..... STRONG  Places: Offices, Cities, Routes, Warehouses,...  Things: Equipment, Products, Vehicles, Parts,....  Organizations: Suppliers, Teams, Agencies, Depts,...  Concepts: Projects, Orders, Complaints, Accounts,......  Events: Meetings, Appointments. WEAK 109
  • STRONG ENTITIES  An entity is Existence Independent if an instance can exist in isolation.  For example, CUSTOMER is existence independent of ORDER, but ORDER is existence dependent on CUSTOMER. The ORDER is by a particular customer for a/many particular item(s)  An entity is identified if each instance can be uniquely distinguished by its attributes (or relationships).  For example, CUSTOMER is identified by Customer#, PERSON is identified by Name+Address+DoB, ORDER is identified by Customer#+Date+Time. 110
  • STRONG ENTITIES  An entity is STRONG if it can be identified by its (own) immediate attributes. Otherwise it is weak.  For example, CUSTOMER and PERSON are strong entities, but ORDER is weak because it requires an attribute of another entity to identify it. ORDER would be strong if it had an Order#.  Existence independent entities are always strong. 111
  • The Method: How to Develop the ERM  Step1: Search for Strong Entities and Attributes  Step2. Attach attributes and identify strong entities.  Step3. Search for relationships.  Step4. Determine constraints.  Step5. Attach remaining attributes to entities and relationships.  Step6. Expand multivalued attributes, and relationship attributes.  Represent attributed relationships and/or multivalued attributes in a Functional Dependency Diagram.  Step7. Identify weak entities.  Step8. Iterate steps 4,5,6,7,8 until no further expansion is possible.  Step9. Look for generalization and specialization; Analyze Cycles; Convert domain-sharing attributes to entities. 112
  • The 1 Search for Method strong entities 2 Narrative and attributes Identify Attributes & strong Forms Entities entities 3 Strong entities Search for 7 relationships Identify 4&5 weak entities Determine Identified constraints and weak Relationships attach attributes entities Entity-Relationship Weak Entities 6 Diagram Expand attributed relationships and/or multivalued attributes 6’ Functional Represent attributed Dependency relationships and/or multivalued attributes Diagrams113 as Functional Dependencies
  • Step1: Search for Strong Entities and Attributes  1 Entities  relevant nouns  many instances  have properties (attributes or relationships)  identifiable by properties  2 Strong Entities  independent existence  identifiable by own single-valued attributes •3 Attributes –printable names, measurements –domain of values –no properties –dependent existence 114
  • A worked example finding strong Entities A customer is identified by a customer#. A customer has a name and an address. A customer may order quantities Here we have a scenario. of many items. An item may Try to firstly identify all of be ordered by many the strong entities followed customers. An item is and all of the attributes. identified by a stock#. An Can you also identify a weak item has a description and a entity? Are there any attributes that you have price. A stock item may have missed? many colours. Any item ordered by a customer on the same day is part of the same order 115
  • Worked Example Continued Let us take and place it around the nouns. These lead us to what we will consider to be A customer is identified by a the strong entities. If we then customer#. A customer has a place the around items name and an address. A that we think would be the customer may order quantities of attributes, we can see if if any of the identified Entities are many items. An item may be strong. You will notice that the ordered by many customers. An item has a description, price, item is identified by a stock#. colour and stock # and a An item has a description and a customer has a customer price. A stock item may have number, name, and address. many colours. Any item ordered These a Existence Independent by a customer on the same day is Entities, and hence they must be part of the same order strong. 116
  • Worked Example Continued We have our Entities and the attributes displayed before us. Customer and Item are strong entities as they are Existence Independent. What about Order? Order cannot be identified completely by any of its own attributes. Conceptual Schema It is dependent on the attributes of the other 2 CUSTOMER ITEM entities to be identified. Address Customer# Date An order is made up of a Quantity Stock# Description customer ordering an Price Customer Name item. We need the Colour customer# and the item# ORDER to identify the order 117
  • Step2. Identify Strong Entities. We now attach the attributes that belong to each of the Strong Entities. Notice that there are some left that belong to neither Customer or Item. We will look at this later. Conceptual Schema Customer# Stock# Price CUSTOMER ITEM Desc Address Colour CustName Qty Date Both Customer and Item have what we call a Natural Identity 118
  • Another Example of the Difference Between Weak and Strong Entities Here is another example of a common occurrence that demonstrates the difference between a strong entity and a weak entity  A strong entity is identified by its own attributes.  Bidders make purchases of goods at the auction. BIDDER and a GOOD have independent existence, hence are strong, but PURCHASE requires attributes of BIDDER and GOOD. The Purchase is the identified by the Bibbers name and the Goods description. These are 2 attributes that belong to both the Bidder and the Good respectively. 119
  • Additional Rules for Entities For an Entity to exist we have the following additional rules:  There must be more than one instance of an entity.  The company provides superannuation for its workers. Here there is only one instance of COMPANY so it is not a valid entity. We do not model anything that only has one instance  Each instance of an entity must be potentially distinguishable by its properties.  Members send five dollars to the association. A dollar does not normally have distinguishing attributes. 120
  • Step3. Search for Relationships. We can now identify Relationships that have the following properties:  Relationships  Have associate entities  Are relevant must be worth recording  Can be"structural" verbs in the narrative persistent, rather than transient relationships  Can be "abstract" nouns in the narrative nonmaterial connections, eg. Enrolment  Can be verbalizable in the narrative eg. Student EnrolledIn Unit  Have 2 (binary)or more associated entities.(3-Ternary, up to n-ary for n associated entities) 121
  • Relationships:  A relationship must be relevant. It should indicate a structural, persistent (extending over time) association between entities. Students enrol in units selected from the handbook.  A relationship should not usually indicate a procedural event (one that occurs momentarily, then is forgotten.). Students read about units selected from the handbook. 122
  • Relationships and the Worked Example. We can now deal with the order. The order is a relationship between the Customer and the Item. It is for a set Quantity on a given Date. Conceptual Schema Customer# Stock# Price CUSTOMER ORDERS ITEM Desc Address Colour CustName Qty Date 123
  • Second Worked Example: The Agent Analyze the data kept by the agent. Identify the entities, attributes and the relationships. To start with look at the nouns. Customers may order products stocked by various suppliers through the agent. The agent maintains a catalogue of what products are available from suppliers. The price of a product may depend on the supplier. Some products come in a variety of colours independently of supplier. Suppliers ship directly to customers and notify the agent only of the date and total. Customers then pay each supplier through the agent. The agent keeps records of all orders and payments, but is not interested in maintaining detailed invoice lines. 124
  • Second Worked Example: The Agent The nouns are Customers may order products stocked by various suppliers through the agent. The agent maintains a catalogue of what products are available from suppliers. The price of a product may depend on the supplier. Some products come in a variety of colours independently of supplier. Suppliers ship directly to customers and notify the agent only of the date and total. Customers then pay each supplier through the agent. The agent keeps records of all orders and payments, but is not interested in maintaining detailed invoice lines. We have Customers, Products, Suppliers and an Agent. How many Agents are there. This is the Data for the Agent. There is only one instance. Hence we do not model it. 125
  • The Agent:Additional Information Customer#: 28 Date: Oct 3, 1996 Customer Name: Jill Hill 28 Fullview Lane, Glenvale These forms Stock# Description Qty can tell us 156 Cup of Tea 4 234 Pussy Cat 2 more information about the way the business Manufacturer: Hill Creat Industries runs. Address: 23 Highhill Blvd, Sumpend Stock# Description Price 156 Cup of Tea 1 234 Pussy Cat 25 Manufacturer: Hill Creat Industries Address: 23 Highhill Blvd, Sumpend Customer#: 28 Shipment Date: Oct 9, 1996 Customer Name: Jill Hill Total 54 126
  • The Agent:Additional Information •Notice that the forms also tell us the following additional facts: •A Customer has a Cust#, Name and Address. The Supplier has a Name and Address and the stock has a Stock#, Description and Price. •An order is made on a Date and is for the one Customer for many items. It also has the number of each item ordered. •The shipping docket has the Date of shipping, both the Customers and Suppliers details along with the total price of the goods delivered. •Try yourself to represent this in a diagram with the strong entities and the relationships between them. 127
  • The Agent: Strong Entities The Strong Entities Each of the Entities below are strong. They have a Natural Identity and are Existent Independent. They are completely identifiable by their attributes Name Address Stock# Cust# {Colour} CUSTOMER PRODUCT SUPPLIER Tradename Address 128
  • The Agent: Relationships The Customer orders a Quantity of a particular product. All products are supplied from a Supplier at a price. Name Stock# Address Qty {Colour} Cust# CUSTOMER PRODUCT ORDERS ER Diagram AVAILABLE FROM Price SUPPLIER Tradename Address 129
  • The Agent: Final Solution The Product is shipped from the Supplier to the Customer on a Date with a total cost for the goods, and the Customer pays the Supplier on a Date an amount (which could be the amount for a number of shipments) Name Barcode Address Qty {Colour} CUSTOMER PRODUCT ORDERS Paydate ER PAID Diagram Amount AVAILABLE Date FROM RECEIVED FROM Total Price SUPPLIER Tradename 130
  • Entity Relationship Analysis 2 We will now concentrate on the following areas of good ERM  Cardinality and Participation Constraints  Expanding to Weak Entities  Identifying Weak Entities  Derived Attributes and Relationships  Ternary Relationships 131
  • These are Steps 4,5 & 6 from the Original Diagram Unidentified Strong entities Unattched Attributes weak entities 4 & 5 Determine Identified 7 Relationships constraints and weak Identify attach attributes entities weak entities 6 Expand attributed Entity-Relationship relationships, Weak Entities Diagram domain sharing & multivalued attributes 132
  • Step4. Determine constraints: Cardinality(How many participate To complete this we “fix a single instance at one end and ask how many (one or many) are involved at the other end”. Look at the relationship where the Customer Orders an Item. Consider a single Customer. Can they order many items at the one time? Yes We have seen this. So we position a crows foot (<) at the point where the line touches the Entity Item. We then ask if an Item can be ordered by many Customers? Yes So agin we place a crows foot at the Customers end. ORDERS CUSTOMER ITEM From left to right-A Cust can order many Items From right to left- An Item can be ordered by many Cust 133
  • Step4. Determine constraints: Cardinality. Again to complete this task we “Fix a single instance at one end and ask how many (one or many) are involved at the CUSTOMER other end”. All of the Customers live in a City. A Customer can only live in one City(unless they are politicians) In this case we must place a single straight line (|) at the intersection of the LIVES IN relationship line and the Entity City. However, a city can have many Customers. We show this by placing crows foot (>) at the end near the Customer CITY 134
  • Step4. The Resulting ER with the Cardinality Constraints in Place ORDERS CUSTOMER ITEM Many CUSTOMERs can ORDER an Many ITEMs ITEM. can be {Colour} ORDERed by LIVES INMany CUSTOMERs a An ITEM can LIVE IN a CUSTOMER. can have CITY. many Colours. A CUSTOMER can LIVE IN only one CITY CITY. 135
  • Step4.Determine constraints: Participation. Again, we “Fix a single instance at one end and ask if any must (might or must) be involved at the other end”. We ask “Does the Customer have to order an Item? Well, some would say that they do not they are not Customers! But we know that we must be able to recognise our Customers even though at present they do not have an order with us. So, in this case they do not have to place an order. This is then not mandatory, and we show it by placing the O beside the cardinality constraint. An Item does not have to be on an order as well, so it also gets the O notation. ORDERS CUSTOMER ITEM 136
  • Step4.Determine constraints: Participation. This is also the case for the Customer living in the City. Does the customer have to live in the City? In this case Yes, as we class all areas as being within a City. Hence we place the “|” symbol beside the cardinality CUSTOMER constraint next to the Entity City. The next one is difficult. Does a City have to have a Customer living in it. You might think No here, but are you prepared to record all of the cities in the world just to make sure? LIVES IN Common sense tells us that we have to make this mandatory so we only keep a record of the cities where our Customers live. CITY 137
  • Step4. The Resulting ER with the Participation Constraints in Place ORDERS CUSTOMER ITEM An ITEM might be ordered by a CUSTOMER. A CUSTOMER might LIVES IN CITY must have A order a ITEM. a CUSTOMER LIVing IN it. A CUSTOMER must LIVE IN a CITY CITY. 138
  • Step4. Determine constraints: Validation by Population. CUSTOMER ITEM ORDERS Cust# An important method of {Colour} evaluating the proposed model LIVES IN is to populate with instances Stock# that demonstrate that the constraints that you have identified will work. CITY CityName 139
  • Step4. Tables Created to Validate CUSTOMER ITEM ORDERS Cust# Stock# Cust# 12 77 {Colour} LIVES IN 23 77 Stock# CityName Cust# 12 88 Ayr 12 99 Ayr 23 13 Tully 13 Stock# Colour 77 Pink CITY CityName 77 Blue 140
  • Step5. Attach remaining attributes to entities and relationships. In the previous lectures we looked at a worked problem with a Customer ordering an Item. Here we were able to identify Entities from the narration. Next we also listed the attributes which helped us identify the Strong Entities. We noticed that there were some Attributes, Qty and Date, left that could not be attached to any of the strong entities. They, in fact, belong to the Relationship that was associated with the two Entities. Customer# Stock# ORDERS Price CUSTOMER ITEM Desc Address Colour CustName Qty Date 141
  • Step5. Attach remaining attributes to entities and relationships. The quantity attribute cannot be attached to the Customer, as the Customer will order different quantities of various items at any time. It cannot also be attached to the Item. It must therefore be attached to the relationship between them, being the order. This is also the situation for the Date that the order was placed. 142
  • Step5. Attach remaining attributes to entities and relationships. Conceptual Schema Customer# Stock# Price CUSTOMER ORDERS ITEM Desc Address Qty Date {Colour} CustName 143
  • Step6.Expand multi-valued attributes, domain sharing attributes and binary relationship attributes. Once we have identified the Strong Entities, Relationships and attached all Attributes to either the Strong Entities or Relationships, we are required to expand the diagram as much as possible to permit us to complete the process. This requires us to move in 2 directions. We must first look at all of the binary relationships to see what the cardinality constraints are between them. If they are “many-to-many” they must be carefully considered and expanded where appropriate. We then must look at what we call Multi-valued Attributes and Domain Sharing Attributes. The process is shown on the following diagram. 144
  • Step6 Entity-Relationship Diagram Many-to-many Multi-valued Attributes Relationships with Attributes Domain Sharing Attributes Expand Expand Multi-valued and relationships domain sharing with attributes attributes Associative Entities Characteristic Entities Dependent Entities 145
  • Step6 In the worked example we have a Many-to-Many relationship with 2 attributes . When we have a Many-to-Many relationship with attached attributes we are required to create an Associative Entity that bridges the 2 Entities. Conceptual Schema Customer# Stock# Price CUSTOMER ORDERS ITEM Desc Address Qty Date {Colour} CustName 146
  • Step6 Between Customer and Item we create the Weak (Associative) Entity called Order. We have to redo the constraints. A customer can place many orders or none. An order can come from only one customer, and must be from a customer. An order is for many items and must be for at least one item, and an item can be on many orders but does not have to appear on an order. These have all been placed in the diagram shown below in their correct position. Associative Entity Stock# Customer# MAKES FOR Price CUSTOMER ORDER ITEM Desc Qty Address Date CustName 147
  • Step6 We have also noticed that an item can come in many colours. This is a multi-valued attribute. We can show this in our extended diagram by having a relationship between the Item and the Colour, where colour is the only attribute of the entity. In this case we are also saying that the colour of the item is optional (IE natural if requested) and that the only colours to be recorded are those that are used. Associative Entity Stock# Customer# MAKES FOR Price CUSTOMER ORDER ITEM Desc Qty Address Date HAS CustName COLOUR Characteristic Entity 148 Colour
  • Step6. Expand domain sharing attributes. Managers supervise Workers. All employees are residents of a City. Employees who live in different cities from their managers get a special allowance. City City SUPERVISES MANAGER WORKER Allowance Characteristic Entity CITY OF OF CityName SUPERVISES MANAGER WORKER Allowance 149
  • Step7. Identify weak entities. Clarify the notion of instance. Weak entities are often ambiguous and difficult to agree on. Attributes may be part of a key for a weak entity, but at least one (one-must) relationship for identification is required. So when we convert this into a table it will require one of the PKs from the strong entities as part of its own composite PK. Validation, not design. The purpose of identification is not to allocate a primary key, but to validate the concept. We have to be able to justify the concept of the relationship in the real world. Never invent keys. I know that it is tempting but you must reflect the business as it is. 150
  • Step7. Identify weak entities. Conceptual Schema Customer# Stock# FOR Price CUSTOMER ORDER ITEM Desc Qty Address MAKES Date HAS CustName An ORDER is uniquely COLOUR identified by the Colour CUSTOMER and the Date. 151
  • Step7. Identify weak entities. Conceptual Schema Customer# Stock# FOR Price CUSTOMER ORDER ITEM Desc Qty Address MAKES Date HAS CustName Here we still have the relationship COLOUR between Order and Item that is many to many with attributes. We must expand this. Colour 152
  • Step8. Iterate until no further expansion is possible. An intersection entity is We introduce the weak entity orderline that one that is identified by for one item. It is fully dependent on the only by its relationships. attributes of Order and Item to be identified Conceptual Schema Stock# Date HAS FOR Price ORDER ORDERLINE ITEM Desc Customer# MADE BY Qty HAS CUSTOMER An ORDERLINE COLOUR is identified by an Address ITEM on an Colour CustName ORDER. 153
  • Step8. Iterate until no further expansion is possible.  Ultimately every attribute must be single valued and attached to an entity.  Different development paths are possible. Your model could be different than mine depending on your research and your interpretation of the business.  Retract intersection entities. Even though we just showed you how to expand them in actual fact as they are fully dependent on the attributes of the surrounding entities you just retract them or ignore them. The conversion from ERM to the Schema will take care of everything. 154
  • WARNING: Forms are not Entities The problem is that when people see forms they want to produce a table. This is not always the case. Many forms that you see in the workplace are reports. They have been derived by different pieces of information. That is part of the functionality of a good database management system. Remember that:  Forms contain attributes from many different entities.  Forms are part of an already existing Information System and are not necessarily part of the new system that is looking at the entities.  Forms are requirements documents, so can be analysed according to the Method.  Forms are often not identifiable and contain information about many weak entities. 155
  • Derived Attributes  Attributes can sometimes be derived from other attributes by calculation. Each product has a wholesale price and a retail price. There is always a 20% markup. Barcode PRODUCT Wholesale Price Retail Price * 156
  • Derived Relationships • Relationships can sometimes be logically derived from other relationships. Consider this situation A student is enrolled in a unit and each unit belongs to a course STUDENT COURSE STUDIES OFFERED IN UNIT 157
  • Derived Relationships • Now in addition place this in the picture. A student enrolled in a unit must be enrolled in the course offering the unit. • Retain derived relationships that bear constraints. This information needs to be kept and not taken out as repeating, due to its constraints ENROLLED IN * STUDENT COURSE STUDIES OFFERED IN UNIT 158
  • TERNARY RELATIONSHIPS In some situations the relationships that hold together entities are quite complex. In most cases they are binary and a simple bi-polar positioning will work. It is when we have to hold three or more entities together that things can get quite complicated. Let us look at a situation that requires a Ternary relationship to be used.  An Employee may be assigned to many projects. An employee may have many skills, but an employee may use only one skill or a particular project. A project may require several skills and several employees. 159
  • TERNARY RELATIONSHIPS: Example QUALIFIED IN EMPLOYEE SKILL Three binary relationships cannot represent the fact WORKS ON that a particular employee uses a particular skill on a particular project. REQUIRES PROJECT 160
  • TERNARY RELATIONSHIPS: Cardinality Constraints An employee may use only one skill on a project. EMPLOYEE SKILL An employee may use a skill Many on many employees may projects. use a skill on a PROJECT project. 161
  • TERNARY RELATIONSHIPS: Rule for Existence For a ternary to be valid all associated binaries must be many-to-many. 162
  • The Agent Revisited Do you remember this problem that we had previously? Customers may order products stocked by various suppliers through the agent. The agent maintains a catalogue of what products are available from suppliers. The price of a product may depend on the supplier. Some products come in a variety of colours independently of supplier. Suppliers ship directly to customers and notify the agent only of the date and total. Customers then pay each supplier through the agent. The agent keeps records of all orders and payments, but is not interested in maintaining detailed invoice lines. We modelled it as demonstrated on the next slide 163
  • Example: The Agent, original simple solution Name Barcode Address Qty {Colour} CUSTOMER PRODUCT ORDERS Paydate ER PAID Diagram Amount AVAILABLE Date FROM RECEIVED FROM Total Price SUPPLIER Tradename Now we need to expand it. 164
  • Example: The Agent Expanded ER Diagram Let us first look at the relationship between the customer and the product. We see that it is a many to many relationship with attached attributes (QTY). It must then be expanded. We do this by creating the weak entity Order which is identified by the date and the customers name. We do not bother introducing the orderline weak entity as it is only identifiable by the attributes of the surrounding entities {Colour} COLOUR Name Date Barcode Address MAKES FOR CUSTOMER ORDER PRODUCT Qty 165
  • Example: The Agent Consider the relationship between the customer and the supplier. Here we have 2 many to many Name relationships that have to be expanded. They create Address the weak entities payment and shipment as detailed CUSTOMER below, with the attached attributes. Also, they have new constraints with them that show us the identifying attributes that belong to them. PAID RECEIVES Paydate PAYMENT Amount Date TO SHIPMENT Total FROM SUPPLIER Tradename 166
  • {Colour} Example: The Agent COLOUR Barcode Finally we have to consider the relationship between the supplier and PRODUCT the product. Here we again have a many to many that requires expanding, creating the weak entity Holding, IN identified by attributes from both the product and the supplier with its own attribute price. This is because HOLDING different suppliers supply the goods at different prices. Price SUPPLIER HAS Tradename 167
  • Example: The Agent The final solution In the end we have to combine all of these sections together to create the final ERM diagram for this problem 168
  • Example: The Agent The final solution {Colour} COLOUR Name Date Barcode Address MAKES FOR CUSTOMER ORDER PRODUCT Qty PAID IN RECEIVES Paydate PAYMENT Amount HOLDING Date TO Price SHIPMENT Total FROM SUPPLIER HAS Tradename 169
  • 170
  • What is Normalisation  A process that ensures that each attribute is attached to the correct entity  A process of grouping data elements into tables representing entities and their relationships  An integral part of a design method that produces flexible and reliable databases 171
  • Why Normalise Data?  Minimises data redundancy  The only “redundancy” is the foreign key linking data  This isn‟t redundancy as the link has to be defined in some fashion  Most stable form to change  Most robust structure  Most adaptable and flexible structure 172
  • Normal Forms  Introduced by E.F.Codd, there were originally three normal forms (The abbreviation is NF) These are sufficient for nearly all DB‟s  In addition there are Boyce-Codd, 4th, 5th and domain-key normal forms. These are rarely required and will not be covered in this course. 173
  • Primary Keys  An attribute (or group of attributes) that uniquely identifies a particular record in a relation EMPLOYEE(Employee#,Name,Salary, Department#) ORDER_ITEM(Item#, Order#, Quantity) STUDENT(Stud No, Name(subcode,stitle,result))  Primary key is underlined 174
  • Foreign Keys  An attribute in one relation (table) that is the primary key in another relation EMPLOYEE(Employee#, Name, Salary, Department#) DEPARTMENT(Department#, Dname, Budget) Foreign Keys TOUR(Tourcode, Tname) BOOKING(Booking#, Seats, Tourcode, Depdate) 175
  • First Normal Form (1)  Consider the problem posed by the entity STUDENT(Stud No, Surname(Subcode, Subname, Result)) Entering data we might obtain “Repeating Group” How long should the record be? 176
  • First Normal Form (2)  To convert the entity to 1st normal form (1NF) remove any repeating groups of data items from the unnormalised data  EACH RECORD MUST HAVE THE SAME LENGTH STUDENT(Stud No, Surname(Subcode, Subname, Result)) Repeats a varying number of times, depending upon how many subjects the student is enrolled in 177
  • Converting to 1NF 1. Remove the repeating group and make a new relation/entity 2. The „new‟ relation now gets a concatenated primary key, which is made of the primary key of the original relation and the “primary” key of the repeating group 3. Give the new relationship a descriptive name 178
  • 1. Remove the Repeating Group  To convert our relation STUDENT(Stud No,Surname(Subcode,Subname,Result))  Remove the repeating group and state it as a separate relation STUDENT(Stud No, Surname) (Subcode, Subname, Result) 179
  • 2. A Concatenated Primary Key STUDENT(Stud No, Surname) (Subcode, Subname, Result)  Give the new (unnamed) relation a primary key consisting of the primary key of the original relation and the key of the repeating group STUDENT(Stud No, Surname) (Stud No,Subcode, Subname, Result) 180
  • 3. Name the New Relation STUDENT(Stud No, Surname) (Stud No, Subcode, Subname, Result)  Give the new relation a descriptive name STUDENT(Stud No, Surname) SUBJECT(Stud No,Subcode, Subname, Result) 181
  • In First Normal Form (1) So in first normal form (1NF) the original relation: STUDENT(Stud No,Surname(Subcode,Subname,Result)) has become a pair of relations STUDENT(Stud No, Surname) SUBJECT(Stud No, Subcode, Subname, Result) 182
  • In First Normal Form (2) 183
  • First Normal Forms - Examples  Change to first normal form  EMPLOYEE(Employee#, EmpName,Salary(Proj#.projname))  ORDER(Order#,Orderdate(Part#,NumberOrdered))  Answers  EMPLOYEE(Employee#, EmpName,Salary) PROJECT(Employee#,Proj#,Projname)  ORDER(Order#,Orderdate) PART(Order#,Part#,NumberOrdered) 184
  • Second Normal Form  Consider the following relation and the problems presented when creating, deleting, or updating a record 185
  • Problems  Creation  A new item “9999 - washer” cannot be added to the DB until it has been ordered  Also there could also a different description for the same item in a different row. Eg “9870 - 5cm nut”  Deletion  If order 40 is the only order for nails, deleting it will lose the item# and desc from the DB  Update  If the item description for item 9870 is amended to “octagonal nut” then it must be changed in many places 186
  • Second Normal Form  Must firstly be in 1NF  A non-key attribute cannot be identified by part of a composite key: Order-item(Order#,Item#,Desc,Qty)  The quantity ordered is functionally dependent on the whole of the primary key, the order # and the item# Order-item(Order#,Item#,Desc,Qty)  The description of the item, however, doesn‟t depend on the whole key. It only depends on item# 187
  • Converting to 2NF To convert a relation to second normal form 1. Write down all the possible “combinations” of the attributes forming the primary key. 2. Place each of the other attributes with the appropriate combination 3. Remove any relations that consist of a single attribute primary key alone 4. Give each remaining relation a descriptive name 188
  • 1. Possible Primary Keys  To change our relation Order-item(Order#,Item#,Desc,Qty) into 2NF  Write down all possible “combinations” of the attributes forming the primary key: (Order# (Item# (Order#,Item# 189
  • 2. Matching the Other Attributes Order-item(Order#,Item#,Desc,Qty)  Match each of the other attributes with the primary key that depends upon (Order#) (Item#, Desc) (Order#, Item#, Qty) 190
  • 3. Remove Trivial Relations 4. Name Relations  Remove any relations that consist of a single attribute primary key alone: (Order#) (Item#, Desc) (Order#, Item#, Qty)  Give the remaining relations meaningful names: ITEM(Item#,Desc) ORDER-ITEM(Order#,Item#,Qty) 191
  • In Second Normal Form(1) Order-item(Order#,Item#,Desc,Qty) in second normal form has become ITEM(Item#,Desc) ORDER-ITEM(Order#,Item#,Qty) 192
  • In Second Normal Form(2)  This solves the problems highlighted earlier  ADD new item at any time  DELETE last order for item, but item remains in DB  to UPDATE, desc only needs to be altered in one place 193
  • Second Normal Form - Summary  Must first be in 1NF  Each attribute in a relation must be functionally dependent on the whole of the primary key ie Every attribute needs the full primary key and not just parts of it 194
  • Second Normal Form - Examples(1) Convert to second normal form Q1.TRAINING(Emp#,EmpName,Dept#,Course,DateComplete d) Q2.ORDER-ITEM(Order#,Item#,Date-ord,Qty,Unit-price) A1. EMPLOYEE(Emp#,EmpName,Dept#) TRAINING(Emp#,Course,DateCompleted A2. ORDER(Order#,Dat-ord) ITEM(item#,unit price) ORDER-ITEM(Order#,item#.Qty) 195
  • Second Normal Form - Examples(2) Convert to 2NF EMPLOYEE(Emp#,Dept#,Ename,Salary)  Answer  This is already in 2NF  Any relation in 1NF that has a single attribute as the primary key must be in 1NF as it cannot be dependent on only a portion of the key. 196
  • Third Normal Form  Consider the following relation ( which is in 2NF) and the problems when CREATING, DELETING or UPDATING a record: 197
  • Problems CREATION  Cannot add a new course until a student is enrolled  There is nothing in the design that stops a course having various names in different records eg “A112 - DOT(C)” DELETION  Date for a course is lost when last student enrolled in the course is deleted UPDATE  If the course name changes , it must be altered in many places 198
  • Third Normal Form  Must be in 2NF  A non-key attribute cannot be dependent on another non-key attribute. (This is known as transitive dependency) STUDENT(Student#,Sname,CourseCode,CourseName)  Sname & CourseCode are both dependent on on Student # STUDENT(Student#,Sname,CourseCode,CourseName)  CourseName, however is dependent on CourseCode But CourseCode is a non-key attribute. 199
  • Converting To 3NF 1.Remove all attributes that are dependent on non-key attribute(s) into a new relation 2. Make a non-key attribute(s) that the removed attribute(s) are dependent on, the primary key of the new relation. 3. Give the new relation a descriptive name. 200
  • 1. Remove the Attributes To convert our relation STUDENT(Student#,Sname,CourseCode,CourseName) into 3NF:  Remove all the attributes that are dependent upon a non-key attribute. STUDENT(Student#,Sname,CourseCode) (CourseName) 201
  • 2. Add the Primary key 3. Name the New Relation  Make the non-key attribute that the removed attributes were dependent on, the primary key in the new relation. STUDENT(Student#,Sname,CourseCode) (CourseName)  Give the new relation a meaningful name STUDENT(Student#,Sname,CourseCode) COURSE(CourseCode,CourseName) 202
  • Third Normal Form  This solves the problems highlighted earlier  ADD new course at anytime  DELETE last student in course and the course will still remain  To UPDATE, course name only needs to be altered in one place 203
  • Third Normal Form - Summary  Must be in 2NF (and hence in 1NF)  Each attribute in a relation must be dependent on the primary key only and not any other non- key attributes 204
  • Third Normal Forms - Examples  Convert to 3NF Q1.EMPLOYEE(Emp#,EmpName,Dept#,DeptName) Q2.CUSTOMER(Customer#,Cname,Caddress,SalesRep,Sname)  Answers Q1. EMPLOYEE(Emp#,EmpName,Dept#) DEPARTMENT(Dept#,DeptName) Q2. CUSTOMER(Customer#,Cname,Caddress,SalesRep) SALESREP(SalesRep,Sname) 205
  • Incorrect Decompositions  Groupings of attributes that do not follow the rules of normalisation we have looked at result in:  less flexible databases  databases that lose data, or require major alterations in their data, when creation, deletion or updating of records takes place 206
  • The Normal Forms - Summary  First Normal Form (1NF)  The relation contains no repeating groups  Second Normal Form (2NF)  The relation is in 1NF and each attribute in the relation is functionally dependent on the whole key  Third Normal Form (3NF)  The relation is in 2NF and each attribute in the relation is functionally dependent on nothing but the key. 207
  • A Simple Test for 3NF Each attribute should depend on The key, the whole key and nothing but the key (so help me Codd) (the original ideas behind relational DB‟s were proposed by Dr E.F.Codd) 208
  • Data Dictionaries  A data dictionary is a structured analysis tool that records every data name and defines, precisely, what is meant by that name.  Sometimes they are referred to as a metadata (ie data about data)  All the objects (data flows, data stores, processes, data elements etc) identified during analysis should be defined in the dictionary  It may also, optionally, include physical information about the method of data storage, etc 209
  • Aliases  Sometimes a data element in a system may be referred (known) by more than one name  the accounts department calls it customer_payment  the sales department knows it as customer_owing  To avoid problems occurring because of multiple names for one item, a data dictionary should list any aliases (other names) by which the data is known 210
  • Sample Entry 1 Data Name: Student_ID Description: Unique identifier of students Data Type: Text(7) Values: Text field of 7 digits with the first two digits signifying the current year. Aliases: Student_Number, Student# Where used: Administration, Student_Records 211
  • Scope of Course  The course will not be focussing on aliases or “Where used”  The following slides shows examples of listings expected in this course 212
  • Sample Entry 2 Data Name: Skill_Level Description: A code representing the level of skill of an employee Data Type: Text(2) Values: Represents the number of years experience of employee 1 = 1 year experience 2 = 2 years experience, etc 213
  • Sample Entry 3 Data Name: Budget_Amount Description: Amount set aside for each budget item Data Type: Currency Values: All amounts multiples of $100 214