Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to database


Published on

These slides porovides knowledge about database from beginner to advance level.

Published in: Education, Technology
  • Be the first to comment

Introduction to database

  1. 1. By: Engineer Muhammad Suleman MemonM.E(Information Technology) B.E(Computer System)
  2. 2. A database is a simple, yet flexible andpowerful tool for storing and retrieving data.Every company, every website, has lots of data.The more of your data that you keep in yourdatabase - the better.Far from being a tool only useful to bigbusinesses, even if you just want a simple guestbook or page hit counter, a database is perfect.Whichever database you use - itll be arelational database.
  3. 3.  This is the industry standard design these days. Relational databases use the principles of set theory. Set theory is a field of mathematics that describes how to deal with sets of data. Relational databases are quite intuitive and easy to understand.
  4. 4.  All data is held in tables. A table has columns (along the top) and rows. You create the tables you need. You define the table names. You define what the column names are in each table. You define what type of data the columns are...
  5. 5.  There are a number of different data types available which represent the different types of data you find in real life. There are analogous types in all databases and programming languages. Each has variations, but theyre all fundamentally the same.
  6. 6. They are:• Numerical Types. i.e. Numbers. There are fundamentally two types: integer and float. Integers are whole numbers (i.e. 1, 2, 100, 999999). Floats are numbers with decimal places (i.e. (1.1, 22.5, 3.1415927).• String Types. i.e. Text. There are two types here: Fixed length, and variable length. char is the only fixed length type in MySQL - from 1-255 characters.• varchar is a variable length field that can be 1-255 characters. There are several• text types of varying lengths in MySQL.
  7. 7.  Date and Time Types For storing dates & times. Binary Data This is arbitrary data, could be images, programs absolutely anything.
  8. 8.  All Relational Databases use indexes. Similar to the index in a book, indexes provide a quick way to find the exact data item you want. Imagine you have a database of 100,000 customers, and you want to find just one. If you just read the customers table from start to finish until you find the one your searching for, you could end up having to read all 100,000 records.
  9. 9.  This would be very slow. Most relational databases use a b-tree index structure. This is a clever algorithm that guarantees that you can find a data item by reading at most 3 rows from the index. Databases commonly have millions of rows - so you can see the necessity for indexes!
  10. 10.  Indexes are a large part of databases and their design. Defining a column as the primary key implicitly creates an index. f you have a primary key on a table - it has an index. You can add a number of indexes to each table you have.
  11. 11.  Youd use the create index command - more later... Indexes are used automatically by the database itself when you issue a query (ask for data). It uses the index to find the data in the table . For example, we want to get a customers details from the example customers table above...
  12. 12.  If we submit the following SQL query, the database will use the index it created for primary key column customer_id, and get everything for customer 1: select * from customers where customer_id = 1; The database uses the index because it can use it. The query contains the customer_id so it can look in the index and find the location of customer 1.
  13. 13.  If theres no index on the column in the query, the database will have to go through the whole table! This is called a full table scan .
  14. 14.  These days, when you talk about databases in the wild, you are primarily talking about two types: analytical databases and operational databases.Analytic Databases Analytic databases (a.k.a. OLAP- On Line Analytical Processing) are primarily static, read-only databases which store archived, historical data used for analysis.
  15. 15.  For example, a company might store sales records over the last ten years in an analytic database and use that database to analyze marketing strategies in relationship to demographics. On the web, you will often see analytic databases in the form of inventory catalogs such as the one shown previously from An inventory catalog analytical database usually holds descriptive information about all available products in the inventory.
  16. 16.  Web pages are generated dynamically by querying the list of available products in the inventory against some search parameters. The dynamically-generated page will display the information about each item (such as title, author, ISBN) which is stored in the database.
  17. 17.  Operational databases (a.k.a. OLTP On Line Transaction Processing), on the other hand, are used to manage more dynamic bits of data. These types of databases allow you to do more than simply view archived data. Operational databases allow you to modify that data (add, change or delete data). These types of databases are usually used to track real-time information.
  18. 18.  For example, a company might have an operational database used to track warehouse/stock quantities. As customers order products from an online web store, an operational database can be used to keep track of how many items have been sold and when the company will need to reorder stock
  19. 19.  Besides differentiating databases according to function, databases can also be differentiated according to how they model the data.What is a data model? Well, essentially a data model is a "description" of both a container for data and a methodology for storing and retrieving data from that container. Actually, there isnt really a data model "thing".
  20. 20.  Data models are abstractions, oftentimes mathematical algorithms and concepts. You cannot really touch a data model. But nevertheless, they are very useful. The analysis and design of data models has been the cornerstone of the evolution of databases. As models have advanced so has database efficiency. Before the 1980s, the two most commonly used Database Models were the hierarchical and network systems.
  21. 21.  As its name implies, the Hierarchical Database Model defines hierarchically-arranged data. Perhaps the most intuitive way to visualize this type of relationship is by visualizing an upside down tree of data. In this tree, a single table acts as the "root" of the database from which other tables "branch" out. You will be instantly familiar with this relationship because that is how all windows- based directory management systems (like Windows Explorer) work these days.
  22. 22.  Relationships in such a system are thought of in terms of children and parents such that a child may only have one parent but a parent can have multiple children. Parents and children are tied together by links called "pointers" (perhaps physical addresses inside the file system). A parent will have a list of pointers to each of their children.
  23. 23.  This child/parent rule assures that data is systematically accessible. To get to a low-level table, you start at the root and work your way down through the tree until you reach your target. Of course, as you might imagine, one problem with this system is that the user must know how the tree is structured in order to find anything! The hierarchical model however, is much more efficient than the flat-file model we discussed earlier because there is not as much need for redundant data.
  24. 24.  If a change in the data is necessary, the change might only need to be processed once. Consider the student flatfile database example from our discussion of what databases are:
  25. 25. Examples of hierarchical data represented as relational tables An organization could store employee information in a table that contains attributes/columns such as employee number, first name, last name, and Department number. The organization provides each employee with computer hardware as needed, but computer equipment may only be used by the employee to which it is assigned. The organization could store the computer hardware information in a separate table that includes each parts serial number, type, and the employee that uses it.
  26. 26.  In many ways, the Network Database model was designed to solve some of the more serious problems with the Hierarchical Database Model. Specifically, the Network model solves the problem of data redundancy by representing relationships in terms of sets rather than hierarchy. The model had its origins in the Conference on Data Systems Languages (CODASYL) which had created the Data Base Task Group to explore and design a method to replace the hierarchical model.
  27. 27.  The network model is very similar to the hierarchical model actually. In fact, the hierarchical model is a subset of the network model. However, instead of using a single-parent tree hierarchy, the network model uses set theory to provide a tree-like hierarchy with the exception that child tables were allowed to have more than one parent. his allowed the network model to support many-to-many relationships.
  28. 28.  Visually, a Network Database looks like a hierarchical Database in that you can see it as a type of tree. However, in the case of a Network Database, the look is more like several trees which share branches. Thus, children can have multiple parents and parents can have multiple children.
  29. 29.  (RDBMS - relational database management system) A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organised in tables. A table is a collection of records and each record in a table contains the same fields.
  30. 30. Properties of Relational Tables: Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name
  31. 31.  Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up. Where fields in two different tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields. Often, but not always, the fields will have the same name in both tables.
  32. 32.  For example, an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a given customers bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields.
  33. 33.  Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system. The RELATIONAL database model is based on the Relational Algebra.
  34. 34.  Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets.
  35. 35.  By encapsulating methods with data structures, an ORDBMS server can execute comple x analytical and data manipulation operations to search and transform multimedia and other complex objects. As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin.
  36. 36.  database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new object-management possibilities. Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprie tary call interfaces are all extensions of RDBMS languages and interfaces.
  37. 37.  And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle.
  38. 38.  Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects. Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full- featured database programming capability, while retaining native language compatibility.
  39. 39.  A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment. As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a modest amount of additional effort.
  40. 40.  According to Rao (1994), "The object- oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems. The power of the OODB comes from the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs."
  41. 41.  In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches:
  42. 42.  It provides higher performance management of objects, and it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, world wide web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data.
  43. 43.  In semistructured data model, the information that is normally associated with a schema is contained within the data, which is sometimes called ``self-describing. In such database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application. In some forms of semistructured data there is no separate schema, in others it exists but only places loose constraints on the data.
  44. 44.  Semi-structured data is naturally modelled in terms of graphs which contain labels which give semantics to its underlying structure. Such databases subsume the modelling power of recent extensions of flat relational databases, to nested databases which allow the nesting (or encapsulation) of entities, and to object databases which, in addition, allow cyclic references between objects.
  45. 45.  The associative model divides the real-world things about which data is to be recorded into two sorts: Entities are things that have discrete, independent existence. An entity’s existence does not depend on any other thing. Associations are things whose existence depends on one or more other things, such that if any of those things ceases to exist, then the thing itself ceases to exist or becomes meaningless.
  46. 46. An associative database comprises two data structures:1. A set of items, each of which has a unique identifier, a name and a type.2. A set of links, each of which has a unique identifier, together with the unique identifiers of three other things, that represent the source source, verb and target of a fact that is recorded about the source in the database. Each of the three things identified by the source, verb and target may be either a link or an item.
  47. 47.  The best way to understand the rationale of EAV design is to understand row modeling (of which EAV is a generalized form). Consider a supermarket database that must manage thousands of products and brands, many of which have a transitory existence. Here, it is intuitively obvious that product names should not be hard-coded as names of columns in tables. Instead, one stores product descriptions in a Products table: purchases/sales of individual items are recorded in other tables as separate rows with a product ID referencing this table.
  48. 48.  Conceptually an EAV design involves a single table with three columns, an entity (such as an olfactory receptor ID), an attribute (such as species, which is actually a pointer into the metadata table) and a value for the attribute (e.g., rat). In EAV design, one row stores a single fact. In a conventional table that has one column per attribute, by contrast, one row stores a set of facts. EAV design is appropriate when the number of parameters that potentially apply to an entity is vastly more than those that actually apply to an individual entity.
  49. 49.  The context data model combines features of all the above models. It can be considered as a collection of object- oriented, network and semistructured models or as some kind of object database. In other words this is a flexible model, you can use any type of database structure depending on task. Such data model has been implemented in DBMS ConteXt. The fundamental unit of information storage of ConteXt is a CLASS.
  50. 50.  Class contains METHODS and describes OBJECT. The Object contains FIELDS and PROPERTY. The field may be composite, in this case the field contains SubFields etc. The property is a set of fields that belongs to particular Object. (similar to AVL database). In other words, fields are permanent part of Object but Property is its variable part. The header of Class contains the definition of the internal structure of the Object, which includes the description of each field, such as their type, length, attributes and name.
  51. 51.  Context data model has a set of predefined types as well as user defined types. The predefined types include not only character strings, texts and digits but also pointers (references) and aggregate types (structures). A context model comprises three main data types: REGULAR, VIRTUAL and REFERENCE.
  52. 52.  Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.
  53. 53.  The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views.
  54. 54. Conceptual schema: A conceptual schema or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature. Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).
  55. 55.  Because a conceptual schema represents the semantics of an organization, and not a database design, it may exist on various levels of abstraction. Conceptual data models take a more abstract perspective, identifying the fundamental things, of which the things an individual deals with are just examples. The model does allow for what is called inheritance in object oriented terms.
  56. 56.  A data structure diagram (DSD) is a data model or diagram used to describe conceptual data models by providing graphical notations which document entities and their relationships, and the constraints that binds them.
  57. 57.  Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system. Ensuring, via normalisation procedures and the definition of integrity rules, that the stored database will be non-redundant and properly connected. logical data structuring) is based on the identification of: the entities, their attributes, and the relationships between the entities.
  58. 58. Entity: Something about which an enterprise needs to keep data.Attributes: The properties of an entity.Relationships The connections between entities.
  59. 59.  An Entity may be physicalExample: an Employee; a Part; a Machine Or conceptualExample: a Project; an Order; a Course. Each instance of an entity is different from all others - one or more attributes will typically form a primary key attribute - unique to a particular instance.
  60. 60.  Attributes are the properties of an entity . Data which describes or is owned by an entity. Attributes (data) equate to facts - specific details about entities - details of interest.
  61. 61.  In the real world, objects do not exist in isolation. Our understanding of real world objects is in terms of their relationships with other objects; for example, the earth circles the sun; he is a carpenter ; etc. Any real world object which we are going to include in a data model as an entity type must have some relationship with at least one other entity within the model (even if we are not going to implement that relationship within our database system).
  62. 62. One-to-one: Both tables can have only one record on either side of the relationship. Each primary key value relates to only one (or no) record in the related table. Most one-to-one relationships are forced by business rules and dont flow naturally from the data. In the absence of such a rule, you can usually combine both tables into one table without breaking any normalization rules.
  63. 63. One-to-One Relationships Contd: For example: a Factory may have many Managers during its lifetime; a Manager might be in charge of different Factories during his career.
  64. 64. One-to-many: The primary key table contains only one record that relates to none, one, or many records in the related table. This relationship is similar to the one between you and a parent. You have only one mother, but your mother may have several children.
  65. 65. One-to-many Contd:A formal description: of the relationship shown in the diagram above is: One Factory may make zero or more Components. One Component is made in one (and only one) Factory.
  66. 66. One-to-one: Contd:What this means in a database system is that: one record in a table called Factory may be related to a number of records in a Component table;but a record in the Component table can only be related to one record in the Factory table.
  67. 67. One-to-Many Relationships summarised: For any occurrence of A, there may be 0, 1, or many, occurrences of B. For any occurrence of B, there can only be one occurrence of A.From another perspective: If an A record exists there may be zero or more related B records. Any B record can only be related to a single A record.
  68. 68. Many-to-many: Each record in both tables can relate to any number of records (or no records) in the other table. For instance, if you have several siblings, so do your siblings (have many siblings). Many-to-many relationships require a third table, known as an associate or linking table, because relational systems cant directly accommodate the relationship.
  69. 69. Many-to-many: Contd: Minimally, a many-many relationship will require insertion of a link entity. Further analysis may show that the link entity has attributes of its own - often qualifiers in respect of quantity or time.
  70. 70. Many-to-many: Contd:
  71. 71.  The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of data elements, data types, indexing options and other parameters residing in the DBMS data dictionary. It is the detailed design of a system that includes modules & the databases hardware & software specifications of the system. In the case of relational databases the storage objects are tables which store data in rows and columns.
  72. 72. • The purpose of normailization• Data redundancy and Update Anomalies• Functional Dependencies• The Process of Normalization• First Normal Form (1NF)• Second Normal Form (2NF)• Third Normal Form (3NF)
  73. 73. Normalization is a technique for producing aset of relations with desirable properties, giventhe data requirements of an enterprise.The process of normalization is a formal methodthat identifies relations based on their primary orcandidate keys and the functional dependenciesamong their attributes.
  74. 74. Relations that have redundant data may haveproblems called update anomalies, which areclassified as , Insertion anomalies Deletion anomalies Modification anomalies
  75. 75. To insert a new staff with branchNo B007 into theStaffBranch relation;To delete a tuple that represents the last member of stafflocated at a branch B007;To change the address of branch B003.StaffBranch staffNo sName position salary branchNo bAddress SL21 John White Manager 30000 B005 22 Deer Rd, London SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, LondonFigure 1 StraffBranch relation
  76. 76. Staff staffNo sName position salary branceNo SL21 John White Manager 30000 B005 SG37 Ann Beech Assistant 12000 B003 SG14 David Ford Supervisor 18000 B003 SA9 Mary Howe Assistant 9000 B007 SG5 Susan Brand Manager 24000 B003 SL41 Julie Lee Assistant 9000 B005Branch branceNo bAddress B005 22 Deer Rd, London B007 16 Argyll St, Aberdeen B003 163 Main St,GlasgowFigure 2 Straff and Branch relations
  77. 77. Functional dependency describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, and B is functionally dependent on A ( denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consist of one or more attributes.) B is functionally A B dependent on ADeterminant Refers to the attribute or group of attributes on the left-hand side of the arrow of a functional dependency
  78. 78. Trival functional dependency means that the right-handside is a subset ( not necessarily a proper subset) of the left-hand side.For example: (See Figure 1) staffNo, sName  sName staffNo, sName  staffNoThey do not provide any additional information about possible integrityconstraints on the values held by these attributes.We are normally more interested in nontrivial dependencies because theyrepresent integrity constraints for the relation.
  79. 79. Main characteristics of functional dependencies in normalization• Have a one-to-one relationship between attribute(s) on the left- and right- hand side of a dependency;• hold for all time;• are nontrivial.
  80. 80. Identifying the primary key Functional dependency is a property of the meaning or semantics of the attributes in a relation. When a functional dependency is present, the dependency is specified as a constraint between the attributes.An important integrity constraint to consider first is theidentification of candidate keys, one of which isselected tobe the primary key for the relation using functionaldependency.
  81. 81. Inference RulesA set of all functional dependencies that are implied by a givenset of functional dependencies X is called closure of X, writtenX+. A set of inference rule is needed to compute X+ from X.Armstrong’s axioms1. Relfexivity: If B is a subset of A, them A  B2. Augmentation: If A  B, then A, C  B3. Transitivity: If A  B and B  C, then A C4. Self-determination: AA5. Decomposition: If A  B,C then A  B and A C6. Union: If A  B and A  C, then A B,C7. Composition: If A  B and C  D, then A,C B,
  82. 82. Minial Sets of Functional DependenciesA set of functional dependencies X is minimal if it satisfiesthe following condition: • Every dependency in X has a single attribute on its right-hand side • We cannot replace any dependency A  B in X with dependency C B, where C is a proper subset of A, and still have a set of dependencies that is equivalent to X. • We cannot remove any dependency from X and still have a set of dependencies that is equivalent to X.
  83. 83. Example of A Minial Sets of Functional Dependencies A set of functional dependencies for the StaffBranch relation satisfies the three conditions for producing a minimal set. staffNo  sName staffNo  position staffNo  salary staffNo  branchNo staffNo  bAddress branchNo  bAddress branchNo, position  salary bAddress, position  salary
  84. 84. • Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part). STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  85. 85. • Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key. Partial Dependency CUSTOMER Cust_ID Name Order_ID 101 AT&T 1234 101 AT&T 156 125 Cisco 1250
  86. 86. • Transitive Dependency – when a non- key attribute determines another non- key attribute. Transitive DependencyEMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg
  87. 87. • Normalization is often executed as a series of steps. Each step corresponds to a specific normal form that has known properties.• As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies.• For the relational data model, it is important to recognize thatit is only first normal form (1NF) that is critical in creating relations. All the subsequent normal forms are optional.
  88. 88. • Unnormalized – There are multivalued attributes or repeating groups• 1 NF – No multivalued attributes or repeating groups.• 2 NF – 1 NF plus no partial dependencies• 3 NF – 2 NF plus no transitive dependencies
  89. 89. All attributes are directly• ISBN  Title or indirectly determined• ISBN  Publisher by the primary key; therefore, the relation is• Publisher  Address at least in 1 NF BOOK ISBN Title Publisher Address
  90. 90. • ISBN  Title The relation is at least in 1NF.• ISBN  Publisher There is no COMPOSITE primary key, therefore there• Publisher  Address can’t be partial dependencies. Therefore, the relation is at least in 2NF BOOK ISBN Title Publisher Address
  91. 91. Publisher is a non-key attribute, and it determines Address,• ISBN  Title another non-key attribute. Therefore, there is a transitive• ISBN  Publisher dependency, which means that• Publisher  Address the relation is NOT in 3 NF. BOOK ISBN Title Publisher Address
  92. 92. We know that the relation is at• ISBN  Title least in 2NF, and it is not in 3• ISBN  Publisher NF. Therefore, we conclude• Publisher  Address that the relation is in 2NF. BOOK ISBN Title Publisher Address
  93. 93. • Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  94. 94. STUDENT Stud_ID Name 101 Lennon 125 JonsonSTUDENT_COURSEStud_ID Course Units 101 MSI 250 3 101 MSI 415 3 125 MSI 331 3
  95. 95. Composite Primary KeySTUDENTStud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  96. 96. • Goal: Remove Partial Dependencies Partial Composite Dependencies Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  97. 97. • Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  98. 98. CUSTOMER STUDENT_COURSEStud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 Stud_ID Course_ID 125 Johnson MSI 331 3.00 101 MSI 250 101 MSI 415 125 MSI 331 STUDENT COURSE Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  99. 99. • Goal: Get rid of transitive dependencies. Transitive DependencyEMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg
  100. 100. • Remove the attributes, which are dependent on a non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg
  101. 101. EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg EMPLOYEE Emp_ID F_Name L_Name Dept_ID 111 Mary Jones 1 122 Sarah Smith 2 DEPARTMENT Dept_ID Dept_Name 1 Acct 2 Mktg
  102. 102. Repeating group = (propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName) Unnormalized form (UNF) A table that contains one or more repeating groups.ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName 6 lawrence Tina 1-Jul-00 31-Aug-01 350 CO40 Murphy PG4 St,Glasgow JohnCR76 kay Tony PG16 5 Novar Dr, Shaw 1-Sep-02 1-Sep-02 450 CO93 Glasgow 6 lawrence Tina PG4 1-Sep-99 10-Jun-00 350 CO40 Murphy St,Glasgow Tony Aline 2 Manor Rd,CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw Stewart Glasgow Tony 5 Novar Dr, Shaw PG16 1-Nov-02 1-Aug-03 450 CO93 GlasgowFigure 3 ClientRental unnormalized table
  103. 103. First Normal Form is a relation in which the intersection of eachrow and column contains one and only one value.There are two approaches to removing repeating groups fromunnormalized tables: 1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. 2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.
  104. 104. The ClientRental relation is defined as follows,ClientRental first approach, we remove the repeating group With the ( clientNo, propertyNo, cName, pAddress, rentStart,rentFinish, rent, ownerNo, oName) entering the appropriate client (property rented details) by data into each row.ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName John 6 lawrence TinaCR76 PG4 1-Jul-00 31-Aug-01 350 CO40 Kay St,Glasgow Murphy John 5 Novar Dr, TonyCR76 PG16 1-Sep-02 1-Sep-02 450 CO93 Kay Glasgow Shaw Aline 6 lawrence TinaCR56 PG4 1-Sep-99 10-Jun-00 350 CO40 Stewart St,Glasgow Murphy Tony Aline 2 Manor Rd,CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw Stewart Glasgow Tony Aline 5 Novar Dr,CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw Stewart GlasgowFigure 4 1NF ClientRental relation with the first approach
  105. 105. Client the secondWith (clientNo, cName) approach, we remove the repeating groupPropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,(property rented details) by placing the repeating data along wit rentFinish, rent, ownerNo, oName)a copy of the original key attribute (clientNo) in a separte relatio ClientNo cName CR76 John Kay CR56 Aline Stewart ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName 6 lawrence Tina CR76 PG4 1-Jul-00 31-Aug-01 350 CO40 St,Glasgow Murphy 5 Novar Dr, Tony CR76 PG16 1-Sep-02 1-Sep-02 450 CO93 Glasgow Shaw 6 lawrence Tina CR56 PG4 1-Sep-99 10-Jun-00 350 CO40 St,Glasgow Murphy 2 Manor Rd, Tony CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Glasgow Shaw 5 Novar Dr, Tony CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Glasgow Shaw Figure 5 1NF ClientRental relation with the second approach
  106. 106. Full functional dependency indicates that if A and Bareattributes of a relation, B is fully functionallydependent on A if B is functionally dependent on A,but not on any proper subset of A.A functional dependency AB is partially dependent ifthere is some attributes that can be removed from A andthe dependency still holds.
  107. 107. Second normal form (2NF) is a relation that is in firstnormal form and every non-primary-key attribute isfully functionally dependent on the primary key.The normalization of 1NF relations to 2NF involvestheremoval of partial dependencies. If a partialdependency exists, we remove the functiondependent attributes fromthe relation by placing them in a new relation alongwitha copy of their determinant.
  108. 108. The ClientRental relation has the following functionaldependencies:fd1 clientNo, propertyNo  rentStart, rentFinish (Primary Key)fd2 clientNo  cName (Partialdependency)fd3 propertyNo  pAddress, rent, ownerNo, oName (Partialdependency)fd4 ownerNo  oName (Transitive Dependency)fd5 clientNo, rentStart  propertyNo, pAddress, rentFinish, rent, ownerNo, oName (Candidate key)fd6 propertyNo, rentStart  clientNo, cName, rentFinish (Candidate key)
  109. 109. After removing the partial dependencies, the creation of the three Client (clientNo, cName)new relations called Client, Rental, andrentStart, rentFinish) Rental (clientNo, propertyNo, PropertyOwnerPropertyOwner (propertyNo, pAddress, rent, ownerNo, oNameClient Rental ClientNo propertyNo rentStart rentFinish ClientNo cName CR76 PG4 1-Jul-00 31-Aug-01 CR76 John Kay CR76 PG16 1-Sep-02 1-Sep-02 CR56 Aline Stewart CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 Client (clientNo, cName) CR56 PG16 1-Nov-02 1-Aug-03 Rental (clientNo, propertyNo, rentStart, rentFinish) PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName) propertyNo pAddress rent ownerNo oName PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw Figure 6 2NF ClientRental relation
  110. 110. Transitive dependency A condition where A, B, and C are attributes of a relation such th if A  B and B  C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).Third normal form (3NF)A relation that is in first and second normal form, and inwhichno non-primary-key attribute is transitively dependent ontheprimary key.The normalization of 2NF relations to 3NF involves theremoval of transitive dependencies by placing theattribute(s) in a new relation along with a copy of thedeterminant.
  111. 111. The functional dependencies for the Client, Rental andPropertyOwner relations are as follows:Clientfd2 clientNo  cName (Primary Key)Rentalfd1 clientNo, propertyNo  rentStart, rentFinish (Primary Key)fd5 clientNo, rentStart  propertyNo, rentFinish (Candidatekey)fd6 propertyNo, rentStart  clientNo, rentFinish (Candidatekey)PropertyOwnerfd3 propertyNo  pAddress, rent, ownerNo, oName (Primary Key)fd4 ownerNo  oName (TransitiveDependency)
  112. 112. The resulting 3NF relations have the forms:Client (clientNo, cName)Rental (clientNo, propertyNo, rentStart, rentFinish)PropertyOwner (propertyNo, pAddress, rent, ownerNo)Owner (ownerNo, oName)
  113. 113. Client RentalClientNo cName ClientNo propertyNo rentStart rentFinishCR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02 CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03PropertyOwner OwnerpropertyNo pAddress rent ownerNo ownerNo oNamePG4 6 lawrence St,Glasgow 350 CO40 CO40 Tina MurphyPG16 5 Novar Dr, Glasgow 450 CO93 CO93 Tony ShawPG36 2 Manor Rd, Glasgow 370 CO93Figure 7 2NF ClientRental relation
  114. 114. Boyce-Codd normal form (BCNF)A relation is in BCNF, if and only if, every determinantis acandidate key.The difference between 3NF and BCNF is that for afunctionaldependency A  B, 3NF allows this dependency in arelationif B is a primary-key attribute and A is not a candidatekey,whereas BCNF insists that for this dependency toremain in arelation, A must be a candidate key.
  115. 115. fd1 clientNo, interviewDate  interviewTime, staffNo, roomNo (PrimaryKey)fd2 staffNo, interviewDate, interviewTime clientNo (Candidate key)fd3 roomNo, interviewDate, interviewTime  clientNo, staffNo (Candidate key)fd4 staffNo, interviewDate  roomNo (not a candidate key)As a consequece the ClientInterview relation may suffer from update anmalies.For example, two tuples have to be updated if the roomNo need be changed forstaffNo SG5 on the 13-May-02. ClientInterview ClientNo interviewDate interviewTime staffNo roomNo CR76 13-May-02 10.30 SG5 G101 CR76 13-May-02 12.00 SG5 G101 CR74 13-May-02 12.00 SG37 G102 CR56 1-Jul-02 10.30 SG5 G102 Figure 8 ClientInterview relation
  116. 116. To transform the ClientInterview relation to BCNF, we must remove the violatingfunctional dependency by creating two new relations called Interview and SatffRoomas shown below,Interview (clientNo, interviewDate, interviewTime, staffNo)StaffRoom(staffNo, interviewDate, roomNo) Interview ClientNo interviewDate interviewTime staffNo CR76 13-May-02 10.30 SG5 CR76 13-May-02 12.00 SG5 CR74 13-May-02 12.00 SG37 CR56 1-Jul-02 10.30 SG5StaffRoom staffNo interviewDate roomNo SG5 13-May-02 G101 SG37 13-May-02 G102 SG5 1-Jul-02 G102 Figure 9 BCNF Interview and StaffRoom relations
  117. 117. Multi-valued dependency (MVD)represents a dependency between attributes (for example, A,B and C) in a relation, such that for each value of A there is aset of values for B and a set of value for C. However, the set ofvalues for B and C are independent of each other.A multi-valued dependency can be further defined asbeingtrivial or nontrivial. A MVD A > B in relation R isdefined as being trivial if • B is a subset of A or •AU B= RA MVD is defined as being nontrivial if neither of the abovetwo conditions is satisfied.
  118. 118. Fourth normal form (4NF)A relation that is in Boyce-Codd normal form andcontainsno nontrivial multi-valued dependencies.
  119. 119. Fifth normal form (5NF) A relation that has no join dependency.Lossless-join dependencyA property of decomposition, which ensures that no spurioustuples are generated when relations are reunited through anatural join operation.Join dependencyDescribes a type of dependency. For example, for a relation Rwith subsets of the attributes of R denoted as A, B, …, Z, arelation R satisfies a join dependency if, and only if, every legalvalue of R is equal to the join of its projections on A, B, …, Z.
  120. 120.  Atomicity requires that database modifications must follow an "all or nothing" rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the database state is left unchanged. To be compliant with the A, a system must guarantee the atomicity in each and every situation, including power failures / errors / crashes. This guarantees that an incomplete transaction cannot exist.
  121. 121.  The consistency property ensures that any transaction the database performs will take it from one consistent state to another. Consistency states that only consistent (valid according to all the rules defined) data will be written to the database. Quite simply, whatever rows will be affected by the transaction will remain consistent with each and every rule that is applied to them (including but not limited to: constraints, cascades, triggers).
  122. 122.  While this is extremely simple and clear, its worth noting that this consistency requirement applies to everything changed by the transaction, without any limit (including triggers firing other triggers launching cascades that eventually fire other triggers etc.) at all.
  123. 123.  Isolation refers to the requirement that no transaction should be able to interfere with another transaction In other words, it should not be possible that two transactions that affect the same rows run concurrently, as the outcome would be unpredicted and the system thus made unreliable at all.
  124. 124.  In effect the only strict way to respect the isolation property is to use a serial model where no two transactions can occur on the same data at the same time and where the result is predictable (i.e. transaction B will happen after transaction A in every single possible case).
  125. 125.  Durability means that once a transaction has been committed, it will remain so. In other words, every committed transaction is protected against power loss/crash/errors and cannot be lost by the system and can thus be guaranteed to be completed. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently. If the database crashes right after a group of SQL statements execute, it should be possible to restore the database state to the point after the last transaction committed.
  126. 126.  The transaction subtracts 10 from A and adds 10 to B. If it succeeds, it would be valid, because the data continues to satisfy the constraint. However, assume that after removing 10 from A, the transaction is unable to modify B. If the database retains As new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction complete or neither.
  127. 127.  Consistency is a very general term that demands the data meets all validation rules. Also, it may be implied that both A and B must be integers. A valid range for A and B may also be implied. All validation rules must be checked to ensure consistency. Assume that a transaction attempts to subtract 10 from A without altering B. Because consistency is checked after each transaction, it is known that A + B = 100 before the transaction begins.
  128. 128.  If the transaction removes 10 from A successfully, atomicity will be achieved. However, a validation check will show that A + B = 90. That is not consistent according to the rules of the database. The entire transaction must be cancelled and the affected rows rolled back to their pre- transaction state.