MI00MI0034 – DatabaseManagementSystems Zafar Ishtiaq- 531111145 Assignment SET I and SET II . Atyab Gulf Catering Co Block 11, Street 108 Jabriya- Kuwait 11/02/2009
MBA-IT Semester III MI0034 – Database Management System Assignment - Set- 1Q1. Differentiate between Traditional File System & Modern Database System?Describe the properties of Database & the Advantage of Database?A1.Differentiate between Traditional File System & Modern Database System:File Base system were the traditional systems which has been replaced now bymodern database systems. All database application are using the Modern daydatabase management systems now a days .The difference between the these two technologies given below.File-based SystemFile-based systems were an early attempt to computerize the manual filingsystem. File-based system is a collection of application programs that performservices for the end-users. Each program defines and manages its data.However, five types of problem are occurred in using the file-based approach:Separation and isolation of dataWhen data is isolated in separate files, it is more difficult for us to access datathat should be available. The application programmer is required to synchronizethe processing of two or more files to ensure the correct data is extracted.Duplication of dataWhen employing the decentralized file-based approach, the uncontrolledduplication of data is occurred. Uncontrolled duplication of data is undesirablebecause: i. Duplication is wasteful ii. Duplication can lead to loss of data integrity
Data dependenceUsing file-based system, the physical structure and storage of the data files andrecords are defined in the application program code. This characteristic is knownas program-data dependence. Making changes to an existing structure are ratherdifficult and will lead to a modification of program. Such maintenance activitiesare time-consuming and subject to error.Incompatible file formatsThe structures of the file are dependent on the application programminglanguage. However file structure provided in one programming language such asdirect file, indexed-sequential file which is available in COBOL programming, maybe different from the structure generated by other programming language suchas C. The direct incompatibility makes them difficult to process jointly.Fixed queries / proliferation of application programsFile-based systems are very dependent upon the application programmer. Anyrequired queries or reports have to be written by the application programmer.Normally, a fixed format query or report can only be entertained and no facilityfor ad-hoc queries if offered.File-based systems also give tremendous pressure on data processing staff, withusers complaints on programs that are inadequate or inefficient in meeting theirdemands. Documentation may be limited and maintenance of the system isdifficult. Provision for security, integrity and recovery capability is very limited.Database Systems:In order to overcome the limitations of the file-based approach, the concept ofdatabase and the Database Management System (DMS) was emerged in 60s.A database is an application that can store and retrieve data very rapidly. Therelational bit refers to how the data is stored in the database and how it isorganized. When we talk about database, we mean a relational database, in factan RDBMS - Relational Database Management System.
In a relational database, all data is stored in tables. These have the same structurerepeated in each row (like a spreadsheet) and it is the relations between thetables that make it a "relational" tableAdvantages:A number of advantages of applying database approach in application system areobtained including:Control of data redundancyThe database approach attempts to eliminate the redundancy by integrating thefile. Although the database approach does not eliminate redundancy entirely, itcontrols the amount of redundancy inherent in the database.Data consistency:By eliminating or controlling redundancy, the database approach reduces the riskof inconsistencies occurring. It ensures all copies of the idea are kept consistent.More information from the same amount of dataWith the integration of the operated data in the database approach, it may bepossible to derive additional information for the same data.Sharing of dataDatabase belongs to the entire organization and can be shared by all authorizedusers.Improved data integrityDatabase integrity provides the validity and consistency of stored data. Integrityis usually expressed in terms of constraints, which are consistency rules that thedatabase is not permitted to violate.Improved securityDatabase approach provides a protection of the data from the unauthorizedusers. It may take the term of user names and passwords to identify user type
and their access right in the operation including retrieval, insertion, updating anddeletion.Enforcement of standardsThe integration of the database enforces the necessary standards including dataformats, naming conventions, documentation standards, update procedures andaccess rules.Economy of scaleCost savings can be obtained by combining all organizations operational data intoone database with applications to work on one source of data.Balance of conflicting requirementsBy having a structural design in the database, the conflicts between users ordepartments can be resolved. Decisions will be based on the base use ofresources for the organization as a whole rather that for an individual entity.Improved data accessibility and responsivenessBy having an integration in the database approach, data accessing can be crosseddepartmental boundaries. This feature provides more functionality and betterservices to the users.Increased productivityThe database approach provides all the low-level file-handling routines. Theprovision of these functions allows the programmer to concentrate more on thespecific functionality required by the users. The fourth-generation environmentprovided by the database can simplify the database application development.Improved maintenanceDatabase approach provides a data independence. As a change of data structurein the database will be affect the application program, it simplifies databaseapplication maintenance.Increased concurrency
Database can manage concurrent data access effectively. It ensures nointerference between users that would not result any loss of information nor lossof integrity.Improved backing and recovery servicesModern database management system provides facilities to minimize the amountof processing that can be lost following a failure by using the transactionapproach.DisadvantagesIn split of a large number of advantages can be found in the database approach, itis not without any challenge. The following disadvantages can be found including:ComplexityDatabase management system is an extremely complex piece of software. Allparties must be familiar with its functionality and take full advantage of it.Therefore, training for the administrators, designers and users is required.SizeThe database management system consumes a substantial amount of mainmemory as well as a large number amount of disk space in order to make it runefficiently.Cost of DBMSA multi-user database management system may be very expensive. Even afterthe installation, there is a high recurrent annual maintenance cost on thesoftware.Cost of conversionWhen moving from a file-base system to a database system, the company isrequired to have additional expenses on hardware acquisition and training cost.PerformanceAs the database approach is to cater for many applications rather than exclusivelyfor a particular one, some applications may not run as fast as before.
Higher impact of a failureThe database approach increases the vulnerability of the system due to thecentralization. As all users and applications reply on the database availability, thefailure of any component can bring operations to a halt and affect the services tothe customer seriouslyQ2. What is the disadvantage of sequential file organization? How do youovercome it? What are the advantages & disadvantages of Dynamic Hashing?Disadvantage of Sequential file organization:A file that contains records or other elements that are stored in a chronologicalorder based on account number or some other identifying data are calledsequential files . In order to locate the desired data, sequential files must be readstarting at the beginning of the file. A sequential file may be stored on asequential access device such as magnetic tape or on a direct access device suchas magnetic disk but the accessing method remains the same.Slow Access :The Major issue with the Sequential files is the slow access of information as theread attempts go through the files one by one until arrived to the desired record.That makes all file operation read –write and update very time consuming incomparison to the random access files.Dynamic Hashing:AdvantagesThe main advantage of hash tables over other table data structures is speed. Thisadvantage is more apparent when the number of entries is large (thousands ormore). Hash tables are particularly efficient when the maximum number ofentries can be predicted in advance, so that the bucket array can be allocatedonce with the optimum size and never resized.
If the set of key-value pairs is fixed and known ahead of time (so insertions anddeletions are not allowed), one may reduce the average lookup cost by a carefulchoice of the hash function, bucket table size, and internal data structures. Inparticular, one may be able to devise a hash function that is collision-free, or evenperfect (see below). In this case the keys need not be stored in the table.DisadvantagesHash tables can be more difficult to implement than self-balancing binary searchtrees. Choosing an effective hash function for a specific application is more an artthan a science. In open-addressed hash tables it is fairly easy to create a poorhash function.Although operations on a hash table take constant time on average, the cost of agood hash function can be significantly higher than the inner loop of the lookupalgorithm for a sequential list or search tree. Thus hash tables are not effectivewhen the number of entries is very small. (However, in some cases the high costof computing the hash function can be mitigated by saving the hash valuetogether with the key.)For certain string processing applications, such as spell-checking, hash tables maybe less efficient than tries, finite automata, or Judy arrays. Also, if each key isrepresented by a small enough number of bits, then, instead of a hash table, onemay use the key directly as the index into an array of values. Note that there areno collisions in this case.The entries stored in a hash table can be enumerated efficiently (at constant costper entry), but only in some pseudo-random order. Therefore, there is no efficientway to efficiently locate an entry whose key is nearest to a given key. Listing all nentries in some specific order generally requires a separate sorting step, whosecost is proportional to log(n) per entry. In comparison, ordered search trees havelookup and insertion cost proportional to log(n), but allow finding the nearest key
at about the same cost, and ordered enumeration of all entries at constant costper entry.If the keys are not stored (because the hash function is collision-free), there maybe no easy way to enumerate the keys that are present in the table at any givenmoment.Although the average cost per operation is constant and fairly small, the cost of asingle operation may be quite high. In particular, if the hash table uses dynamicresizing, an insertion or deletion operation may occasionally take timeproportional to the number of entries. This may be a serious drawback in real-time or interactive applications.Hash tables in general exhibit poor locality of reference—that is, the data to beaccessed is distributed seemingly at random in memory. Because hash tablescause access patterns that jump around, this can trigger microprocessor cachemisses that cause long delays. Compact data structures such as arrays, searchedwith linear search, may be faster if the table is relatively small and keys areintegers or other short strings. According to Moores Law, cache sizes are growingexponentially and so what is considered "small" may be increasing. The optimalperformance point varies from system to system.Hash tables become quite inefficient when there are many collisions. Whileextremely uneven hash distributions are extremely unlikely to arise by chance, amalicious adversary with knowledge of the hash function may be able to supplyinformation to a hash which creates worst-case behavior by causing excessivecollisions, resulting in very poor performance (i.e., a denial of service attack). Incritical applications, either universal hashing can be used or a data structure withbetter worst-case guarantees may be preferable.
Q3. What is relationship type? Explain the difference among a relationshipinstance, relationship type & a relation set?A3.A relationship type R among n entity types E1, E2, …, En is a set of associationsamong entities from these types. Actually, R is a set of relationship instances riwhere each ri is an n-tuple of entities (e1, e2, …, en), and each entity ej in ri is amember of entity type Ej, 1≤j≤n. Hence, a relationship type is a mathematicalrelation on E1, E2, …, En, or alternatively it can be defined as a subset of theCartesian product E1x E2x … xEn . Here, entity types E1, E2, …, En defines a set ofrelationship, called relationship sets.Q4. What is SQL? Discuss.A4.Abbreviation of structured query language, and pronounced either see-kwell or asseparate letters. SQL is a standardized query language for requesting informationfrom a database. The original version called SEQUEL (structured English querylanguage) was designed by an IBM research center in 1974 and 1975. SQL wasfirst introduced as a commercial database system in 1979 by Oracle Corporation.Historically, SQL has been the favorite query language for database managementsystems running on minicomputers and mainframes. Increasingly, however, SQL isbeing supported by PC database systems because it supports distributeddatabases (databases that are spread out over several computer systems). Thisenables several users on a local-area network to access the same databasesimultaneously.Although there are different dialects of SQL, it is nevertheless the closest thing toa standard query language that currently exists. In 1986, ANSI approved arudimentary version of SQL as the official standard, but most versions of SQL since
then have included many extensions to the ANSI standard. In 1991, ANSI updatedthe standard. The new standard is known as SAG SQL.SQL was one of the first commercial languages for Edgar F. Coddsrelationalmodel, as described in his influential 1970 paper, "A Relational Model of Data forLarge Shared Data Banks". Despite not adhering to the relational model asdescribed by Codd, it became the most widely used database language.Although SQL is often described as, and to a great extent is, a declarativelanguage, it also includes procedural elements. SQL became a standard of theAmerican National Standards Institute (ANSI) in 1986, and of the InternationalOrganization for Standards (ISO) in 1987. Since then, the standard has beenenhanced several times with added features. However, issues of SQL codeportability between major RDBMS products still exist due to lack of fullcompliance with, or different interpretations of, the standard. Among the reasonsmentioned are the large size and incomplete specification of the standard, as wellas vendor lock-in.SQL was initially developed at IBM by Donald D. Chamberlin and A. Murphy in theearly 1970s. This version, initially called SEQUEL (Structured English QueryLanguage), was designed to manipulate and retrieve data stored in IBMs originalquasi-relational database management system, System R, which a group at IBMSan Jose Research Laboratory had developed during the 1970s. The acronymSEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-basedHawker Siddeley aircraft company.The first Relational Database Management System (RDBMS) was RDMS,developed at MIT in the early 1970s, soon followed by Ingres, developed in 1974at U.C. Berkeley. Ingres implemented a query language known as QUEL, whichwas later supplanted in the marketplace by SQL.In the late 1970s, Relational Software, Inc. (now Oracle Corporation) saw thepotential of the concepts described by Codd, Chamberlin, and Boyce anddeveloped their own SQL-based RDBMS with aspirations of selling it to the U.S.Navy, Central Intelligence Agency, and other U.S. government agencies. In June1979, Relational Software, Inc. introduced the first commercially availableimplementation of SQL, Oracle V2 (Version2) for VAX computers. Oracle V2 beatIBMs August release of the System/38 RDBMS to market by a few weeks.[citationneeded]
After testing SQL at customer test sites to determine the usefulness andpracticality of the system, IBM began developing commercial products based ontheir System R prototype including System/38, SQL/DS, and DB2, which werecommercially available in 1979, 1981, and 1983, respectively.This chart shows several of the SQL language elements that compose a singlestatement.The SQL language is subdivided into several language elements, including: Clauses, which are constituent components of statements and queries. (In some cases, these are optional.) Expressions, which can produce either scalar values or tables consisting of columns and rows of data. Predicates, which specify conditions that can be evaluated to SQL three- valued logic (3VL) or Boolean (true/false/unknown) truth values and which are used to limit the effects of statements and queries, or to change program flow. Queries, which retrieve the data based on specific criteria. This is the most important element of SQL. Statements, which may have a persistent effect on schemata and data, or which may control transactions, program flow, connections, sessions, or diagnostics. o SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar. Insignificant whitespace is generally ignored in SQL statements and queries, making it easier to format SQL code for readability.
QueriesThe most common operation in SQL is the query, which is performed with thedeclarative SELECT statement. SELECT retrieves data from one or more tables, orexpressions. Standard SELECT statements have no persistent effects on thedatabase. Some non-standard implementations of SELECT can have persistenteffects, such as the SELECT INTO syntax that exists in some databases.Queries allow the user to describe desired data, leaving the databasemanagement system (DBMS) responsible for planning, optimizing, and performingthe physical operations necessary to produce that result as it chooses.A query includes a list of columns to be included in the final result immediatelyfollowing the SELECT keyword. An asterisk ("*") can also be used to specify thatthe query should return all columns of the queried tables. SELECT is the mostcomplex statement in SQL, with optional keywords and clauses that include: The FROM clause which indicates the table(s) from which data is to be retrieved. The FROM clause can include optional JOINsubclauses to specify the rules for joining tables. The WHERE clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE clause eliminates all rows from the result set for which the comparison predicate does not evaluate to True. The GROUP BY clause is used to project rows having common values into a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP BY clause. The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the HAVING clause predicate. The ORDER BY clause identifies which columns are used to sort the resulting data, and in which direction they should be sorted (options are ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined.The following is an example of a SELECT query that returns a list of expensivebooks. The query retrieves all rows from the Book table in which the price columncontains a value greater than 100.00. The result is sorted in ascending order by
title. The asterisk (*) in the select list indicates that all columns of the Book tableshould be included in the result set.SELECT*FROM BookWHERE price >100.00ORDERBY title;The example below demonstrates a query of multiple tables, grouping, andaggregation, by returning a list of books and the number of authors associatedwith each book.SELECTBook.title,COUNT(*)AS AuthorsFROM BookJOINBook_authorONBook.isbn=Book_author.isbnGROUPBYBook.title;Example output might resemble the following:Title Authors---------------------- -------SQL Examples and Guide 4The Joy of SQL 1An Introduction to SQL 2Pitfalls of SQL 1Under the precondition that isbn is the only common column name of the twotables and that a column named title only exists in the Books table, the abovequery could be rewritten in the following form:SELECT title,COUNT(*)AS AuthorsFROM BookNATURALJOINBook_authorGROUPBY title;However, many vendors either do not support this approach, or require certaincolumn naming conventions in order for natural joins to work effectively.
SQL includes operators and functions for calculating values on stored values. SQLallows the use of expressions in the select list to project data, as in the followingexample which returns a list of books that cost more than 100.00 with anadditional sales_tax column containing a sales tax figure calculated at 6% of theprice.SELECTisbn,title,price,price*0.06ASsales_taxFROM BookWHERE price >100.00ORDERBY title;SubqueriesQueries can be nested so that the results of one query can be used in anotherquery via a relational operator or aggregation function. A nested query is alsoknown as a subquery. While joins and other table operations providecomputationally superior (i.e. faster) alternatives in many cases, the use ofsubqueries introduces a hierarchy in execution which can be useful or necessary.In the following example, the aggregation function AVG receives as input theresult of a subquery:SELECTisbn, title, priceFROM BookWHERE price <AVG(SELECT price FROM Book)ORDERBY title;Q5. What is Normalization? Discuss various types of Normal Forms?Normalization is A process of decomposing tables to eliminate data redundancyis called Normalization.1N.F:- The table should caontain scalar or atomic values.2 N.F:- Table should be in 1N.F + No partial functional dependencies3 N.F :-Table should be in 2 N.F + No transitive dependencies
The normal forms defined in relational database theory represent guidelines forrecord design. The guidelines corresponding to first through fifth normal formsare presented here, in terms that do not require an understanding of relationaltheory. The design guidelines are meaningful even if one is not using a relationaldatabase system. We present the guidelines without referring to the concepts ofthe relational model in order to emphasize their generality, and also to makethem easier to understand. Our presentation conveys an intuitive sense of theintended constraints on record design, although in its informality it may beimprecise in some technical details. A comprehensive treatment of the subject isprovided by Date .The normalization rules are designed to prevent update anomalies and datainconsistencies. With respect to performance tradeoffs, these guidelines arebiased toward the assumption that all non-key fields will be updated frequently.They tend to penalize retrieval, since data which may have been retrievable fromone record in an unnormalized design may have to be retrieved from severalrecords in the normalized form. There is no obligation to fully normalize allrecords when actual performance requirements are taken into account.2 FIRST NORMAL FORMFirst normal form  deals with the "shape" of a record type.Under first normal form, all occurrences of a record type must contain the samenumber of fields.First normal form excludes variable repeating fields and groups. This is not somuch a design guideline as a matter of definition. Relational database theorydoesnt deal with records having a variable number of fields.3 SECOND AND THIRD NORMAL FORMSSecond and third normal forms [2, 3, 7] deal with the relationship between non-key and key fields.Under second and third normal forms, a non-key field must provide a fact aboutthe key, us the whole key, and nothing but the key. In addition, the record mustsatisfy first normal form.
We deal now only with "single-valued" facts. The fact could be a one-to-manyrelationship, such as the department of an employee, or a one-to-onerelationship, such as the spouse of an employee. Thus the phrase "Y is a factabout X" signifies a one-to-one or one-to-many relationship between Y and X. Inthe general case, Y might consist of one or more fields, and so might X. In thefollowing example, QUANTITY is a fact about the combination of PART andWAREHOUSE.3.1 Second Normal FormSecond normal form is violated when a non-key field is a fact about a subset of akey. It is only relevant when the key is composite, i.e., consists of several fields.Consider the following inventory record:---------------------------------------------------| PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS |====================-------------------------------The key here consists of the PART and WAREHOUSE fields together, butWAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The basic problemswith this design are: The warehouse address is repeated in every record that refers to a part stored in that warehouse. If the address of the warehouse changes, every record referring to a part stored in that warehouse must be updated. Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse. If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouses address.To satisfy second normal form, the record shown above should be decomposedinto (replaced by) the two records:------------------------------- ---------------------------------| PART | WAREHOUSE | QUANTITY | | WAREHOUSE | WAREHOUSE-ADDRESS |====================----------- =============--------------------
When a data design is changed in this way, replacing unnormalized records withnormalized records, the process is referred to as normalization. The term"normalization" is sometimes used relative to a particular normal form. Thus a setof records may be normalized with respect to second normal form but not withrespect to third.The normalized design enhances the integrity of the data, by minimizingredundancy and inconsistency, but at some possible performance cost for certainretrieval applications. Consider an application that wants the addresses of allwarehouses stocking a certain part. In the unnormalized form, the applicationsearches one record type. With the normalized design, the application has tosearch two record types, and connect the appropriate pairs.3.2 Third Normal FormThird normal form is violated when a non-key field is a fact about another non-key field, as in------------------------------------| EMPLOYEE | DEPARTMENT | LOCATION |============------------------------The EMPLOYEE field is the key. If each department is located in one place, thenthe LOCATION field is a fact about the DEPARTMENT -- in addition to being a factabout the EMPLOYEE. The problems with this design are the same as thosecaused by violations of second normal form: The departments location is repeated in the record of every employee assigned to that department. If the location of the department changes, every such record must be updated. Because of the redundancy, the data might become inconsistent, with different records showing different locations for the same department. If a department has no employees, there may be no record in which to keep the departments location.To satisfy third normal form, the record shown above should be decomposed intothe two records:
------------------------- -------------------------| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |============------------- ==============-----------To summarize, a record is in second and third normal forms if every field is eitherpart of the key or provides a (single-valued) fact about exactly the whole key andnothing else.3.3 Functional DependenciesIn relational database theory, second and third normal forms are defined in termsof functional dependencies, which correspond approximately to our single-valuedfacts. A field Y is "functionally dependent" on a field (or fields) X if it is invalid tohave two records with the same X-value but different Y-values. That is, a given X-value must always occur with the same Y-value. When X is a key, then all fieldsare by definition functionally dependent on X in a trivial way, since there cant betwo records having the same X value.There is a slight technical difference between functional dependencies and single-valued facts as we have presented them. Functional dependencies only existwhen the things involved have unique and singular identifiers (representations).For example, suppose a persons address is a single-valued fact, i.e., a person hasonly one address. If we dont provide unique identifiers for people, then there willnot be a functional dependency in the data:----------------------------------------------| PERSON | ADDRESS |-------------+--------------------------------| John Smith | 123 Main St., New York || John Smith | 321 Center St., San Francisco |----------------------------------------------Although each person has a unique address, a given name can appear withseveral different addresses. Hence we do not have a functional dependencycorresponding to our single-valued fact.Similarly, the address has to be spelled identically in each occurrence in order tohave a functional dependency. In the following case the same person appears tobe living at two different addresses, again precluding a functional dependency.
---------------------------------------| PERSON | ADDRESS |-------------+-------------------------| John Smith | 123 Main St., New York || John Smith | 123 Main Street, NYC |---------------------------------------We are not defending the use of non-unique or non-singular representations.Such practices often lead to data maintenance problems of their own. We do wishto point out, however, that functional dependencies and the various normalforms are really only defined for situations in which there are unique and singularidentifiers. Thus the design guidelines as we present them are a bit stronger thanthose implied by the formal definitions of the normal forms.For instance, we as designers know that in the following example there is a single-valued fact about a non-key field, and hence the design is susceptible to all theupdate anomalies mentioned earlier.----------------------------------------------------------| EMPLOYEE | FATHER | FATHERS-ADDRESS ||============------------+-------------------------------|| Art Smith | John Smith | 123 Main St., New York || Bob Smith | John Smith | 123 Main Street, NYC || Cal Smith | John Smith | 321 Center St., San Francisco |----------------------------------------------------------However, in formal terms, there is no functional dependency here betweenFATHERS-ADDRESS and FATHER, and hence no violation of third normal form.4 FOURTH AND FIFTH NORMAL FORMSFourth  and fifth  normal forms deal with multi-valued facts. The multi-valued fact may correspond to a many-to-many relationship, as with employeesand skills, or to a many-to-one relationship, as with the children of an employee(assuming only one parent is an employee). By "many-to-many" we mean that anemployee may have several skills, and a skill may belong to several employees.Note that we look at the many-to-one relationship between children and fathersas a single-valued fact about a child but a multi-valued fact about a father.
In a sense, fourth and fifth normal forms are also about composite keys. Thesenormal forms attempt to minimize the number of fields involved in a compositekey, as suggested by the examples to follow.4.1 Fourth Normal FormUnder fourth normal form, a record type should not contain two or moreindependent multi-valued facts about an entity. In addition, the record mustsatisfy third normal form.The term "independent" will be discussed after considering an example.Consider employees, skills, and languages, where an employee may have severalskills and several languages. We have here two many-to-many relationships, onebetween employees and skills, and one between employees and languages.Under fourth normal form, these two relationships should not be represented in asingle record such as-------------------------------| EMPLOYEE | SKILL | LANGUAGE |===============================Instead, they should be represented in the two records-------------------- -----------------------| EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE |==================== =======================Note that other fields, not involving multi-valued facts, are permitted to occur inthe record, as in the case of the QUANTITY field in the earlier PART/WAREHOUSEexample.The main problem with violating fourth normal form is that it leads touncertainties in the maintenance policies. Several policies are possible formaintaining two independent multi-valued facts in one record:(1) A disjoint format, in which a record contains either a skill or a language, butnot both:
-------------------------------| EMPLOYEE | SKILL | LANGUAGE ||----------+-------+----------|| Smith | cook | || Smith | type | || Smith | | French || Smith | | German || Smith | | Greek |-------------------------------This is not much different from maintaining two separate record types. (We notein passing that such a format also leads to ambiguities regarding the meanings ofblank fields. A blank SKILL could mean the person has no skill, or the field is notapplicable to this employee, or the data is unknown, or, as in this case, the datamay be found in another record.)(2) A random mix, with three variations:(a) Minimal number of records, with repetitions:-------------------------------| EMPLOYEE | SKILL | LANGUAGE ||----------+-------+----------|| Smith | cook | French || Smith | type | German || Smith | type | Greek |-------------------------------(b) Minimal number of records, with null values:-------------------------------| EMPLOYEE | SKILL | LANGUAGE ||----------+-------+----------|| Smith | cook | French || Smith | type | German || Smith | | Greek |-------------------------------(c) Unrestricted:
-------------------------------| EMPLOYEE | SKILL | LANGUAGE ||----------+-------+----------|| Smith | cook | French || Smith | type | || Smith | | German || Smith | type | Greek |-------------------------------(3) A "cross-product" form, where for each employee, there must be a record forevery possible pairing of one of his skills with one of his languages:-------------------------------| EMPLOYEE | SKILL | LANGUAGE ||----------+-------+----------|| Smith | cook | French || Smith | cook | German || Smith | cook | Greek || Smith | type | French || Smith | type | German || Smith | type | Greek |-------------------------------Other problems caused by violating fourth normal form are similar in spirit tothose mentioned earlier for violations of second or third normal form. They takedifferent variations depending on the chosen maintenance policy: If there are repetitions, then updates have to be done in multiple records, and they could become inconsistent. Insertion of a new skill may involve looking for a record with a blank skill, or inserting a new record with a possibly blank language, or inserting multiple records pairing the new skill with some or all of the languages. Deletion of a skill may involve blanking out the skill field in one or more records (perhaps with a check that this doesnt leave two records with the same language and a blank skill), or deleting one or more records, coupled with a check that the last mention of some language hasnt also been deleted.
Fourth normal form minimizes such update problems.4.1.1 IndependenceWe mentioned independent multi-valued facts earlier, and we now illustratewhat we mean in terms of the example. The two many-to-many relationships,employee:skill and employee:language, are "independent" in that there is nodirect connection between skills and languages. There is only an indirectconnection because they belong to some common employee. That is, it does notmatter which skill is paired with which language in a record; the pairing does notconvey any information. Thats precisely why all the maintenance policiesmentioned earlier can be allowed.In contrast, suppose that an employee could only exercise certain skills in certainlanguages. Perhaps Smith can cook French cuisine only, but can type in French,German, and Greek. Then the pairings of skills and languages becomesmeaningful, and there is no longer an ambiguity of maintenance policies. In thepresent case, only the following form is correct:-------------------------------| EMPLOYEE | SKILL | LANGUAGE ||----------+-------+----------|| Smith | cook | French || Smith | type | French || Smith | type | German || Smith | type | Greek |-------------------------------Thus the employee:skill and employee:language relationships are no longerindependent. These records do not violate fourth normal form. When there is aninterdependence among the relationships, then it is acceptable to represent themin a single record.4.1.2 Multivalued DependenciesFor readers interested in pursuing the technical background of fourth normalform a bit further, we mention that fourth normal form is defined in terms ofmultivalued dependencies, which correspond to our independent multi-valued
facts. Multivalued dependencies, in turn, are defined essentially as relationshipswhich accept the "cross-product" maintenance policy mentioned above. That is,for our example, every one of an employees skills must appear paired with everyone of his languages. It may or may not be obvious to the reader that this isequivalent to our notion of independence: since every possible pairing must bepresent, there is no "information" in the pairings. Such pairings conveyinformation only if some of them can be absent, that is, only if it is possible thatsome employee cannot perform some skill in some language. If all pairings arealways present, then the relationships are really independent.We should also point out that multivalued dependencies and fourth normal formapply as well to relationships involving more than two fields. For example,suppose we extend the earlier example to include projects, in the following sense: An employee uses certain skills on certain projects. An employee uses certain languages on certain projects.If there is no direct connection between the skills and languages that anemployee uses on a project, then we could treat this as two independent many-to-many relationships of the form EP:S and EP:L, where "EP" represents acombination of an employee with a project. A record including employee, project,skill, and language would violate fourth normal form. Two records, containingfields E,P,S and E,P,L, respectively, would satisfy fourth normal form.4.2 Fifth Normal FormFifth normal form deals with cases where information can be reconstructed fromsmaller pieces of information that can be maintained with less redundancy.Second, third, and fourth normal forms also serve this purpose, but fifth normalform generalizes to cases not covered by the others.We will not attempt a comprehensive exposition of fifth normal form, butillustrate the central concept with a commonly used example, namely oneinvolving agents, companies, and products. If agents represent companies,companies make products, and agents sell products, then we might want to keepa record of which agent sells which product for which company. This informationcould be kept in one record type with three fields:-----------------------------
| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car || Smith | GM | truck |-----------------------------This form is necessary in the general case. For example, although agent Smithsells cars made by Ford and trucks made by GM, he does not sell Ford trucks orGM cars. Thus we need the combination of three fields to know whichcombinations are valid and which are not.But suppose that a certain rule was in effect: if an agent sells a certain product,and he represents a company making that product, then he sells that product forthat company.-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car || Smith | Ford | truck || Smith | GM | car || Smith | GM | truck || Jones | Ford | car |-----------------------------In this case, it turns out that we can reconstruct all the true facts from anormalized form consisting of three separate record types, each containing twofields:------------------- --------------------- -------------------| AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT ||-------+---------| |---------+---------| |-------+---------|| Smith | Ford | | Ford | car | | Smith | car || Smith | GM | | Ford | truck | | Smith | truck || Jones | Ford | | GM | car | | Jones | car |------------------- | GM | truck | ------------------- ---------------------
These three record types are in fifth normal form, whereas the correspondingthree-field record shown previously is not.Roughly speaking, we may say that a record type is in fifth normal form when itsinformation content cannot be reconstructed from several smaller record types,i.e., from record types each having fewer fields than the original record. The casewhere all the smaller records have the same key is excluded. If a record type canonly be decomposed into smaller records which all have the same key, then therecord type is considered to be in fifth normal form without decomposition. Arecord type in fifth normal form is also in fourth, third, second, and first normalforms.Fifth normal form does not differ from fourth normal form unless there exists asymmetric constraint such as the rule about agents, companies, and products. Inthe absence of such a constraint, a record type in fourth normal form is always infifth normal form.One advantage of fifth normal form is that certain redundancies can beeliminated. In the normalized form, the fact that Smith sells cars is recorded onlyonce; in the unnormalized form it may be repeated many times.It should be observed that although the normalized form involves more recordtypes, there may be fewer total record occurrences. This is not apparent whenthere are only a few facts to record, as in the example shown above. Theadvantage is realized as more facts are recorded, since the size of the normalizedfiles increases in an additive fashion, while the size of the unnormalized fileincreases in a multiplicative fashion. For example, if we add a new agent who sellsx products for y companies, where each of these companies makes each of theseproducts, we have to add x+y new records to the normalized form, but xy newrecords to the unnormalized form.It should be noted that all three record types are required in the normalized formin order to reconstruct the same information. From the first two record typesshown above we learn that Jones represents Ford and that Ford makes trucks. Butwe cant determine whether Jones sells Ford trucks until we look at the thirdrecord type to determine whether Jones sells trucks at all.The following example illustrates a case in which the rule about agents,companies, and products is satisfied, and which clearly requires all three record
types in the normalized form. Any two of the record types taken alone will implysomething untrue.-----------------------------| AGENT | COMPANY | PRODUCT ||-------+---------+---------|| Smith | Ford | car || Smith | Ford | truck || Smith | GM | car || Smith | GM | truck || Jones | Ford | car || Jones | Ford | truck || Brown | Ford | car || Brown | GM | car || Brown | Totota | car || Brown | Totota | bus |------------------------------------------------ --------------------- -------------------| AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT ||-------+---------| |---------+---------| |-------+---------|| Smith | Ford | | Ford | car | | Smith | car | Fifth| Smith | GM | | Ford | truck | | Smith | truck | Normal| Jones | Ford | | GM | car | | Jones | car | Form| Brown | Ford | | GM | truck | | Jones | truck || Brown | GM | | Toyota | car | | Brown | car || Brown | Toyota | | Toyota | bus | | Brown | bus |------------------- --------------------- -------------------Observe that: Jones sells cars and GM makes cars, but Jones does not represent GM. Brown represents Ford and Ford makes trucks, but Brown does not sell trucks. Brown represents Ford and Brown sells buses, but Ford does not make buses.Fourth and fifth normal forms both deal with combinations of multivalued facts.One difference is that the facts dealt with under fifth normal form are not
independent, in the sense discussed earlier. Another difference is that, althoughfourth normal form can deal with more than two multivalued facts, it onlyrecognizes them in pairwise groups. We can best explain this in terms of thenormalization process implied by fourth normal form. If a record violates fourthnormal form, the associated normalization process decomposes it into tworecords, each containing fewer fields than the original record. Any of theseviolating fourth normal form is again decomposed into two records, and so onuntil the resulting records are all in fourth normal form. At each stage, the set ofrecords after decomposition contains exactly the same information as the set ofrecords before decomposition.In the present example, no pairwise decomposition is possible. There is nocombination of two smaller records which contains the same total information asthe original record. All three of the smaller records are needed. Hence aninformation-preserving pairwise decomposition is not possible, and the originalrecord is not in violation of fourth normal form. Fifth normal form is needed inorder to deal with the redundancies in this case.5 UNAVOIDABLE REDUNDANCIESNormalization certainly doesnt remove all redundancies. Certain redundanciesseem to be unavoidable, particularly when several multivalued facts aredependent rather than independent. In the example shown Section 4.1.1, itseems unavoidable that we record the fact that "Smith can type" several times.Also, when the rule about agents, companies, and products is not in effect, itseems unavoidable that we record the fact that "Smith sells cars" several times.6 INTER-RECORD REDUNDANCYThe normal forms discussed here deal only with redundancies occurring within asingle record type. Fifth normal form is considered to be the "ultimate" normalform with respect to such redundanciesæ.Other redundancies can occur across multiple record types. For the exampleconcerning employees, departments, and locations, the following records are inthird normal form in spite of the obvious redundancy:------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |============------------- ==============----------------------------------| EMPLOYEE | LOCATION |============-----------In fact, two copies of the same record type would constitute the ultimate in thiskind of undetected redundancy.Inter-record redundancy has been recognized for some time , and has recentlybeen addressed in terms of normal forms and normalization .7 CONCLUSIONWhile we have tried to present the normal forms in a simple and understandableway, we are by no means suggesting that the data design process iscorrespondingly simple. The design process involves many complexities which arequite beyond the scope of this paper. In the first place, an initial set of dataelements and records has to be developed, as candidates for normalization. Thenthe factors affecting normalization have to be assessed: Single-valued vs. multi-valued facts. Dependency on the entire key. Independent vs. dependent facts. The presence of mutual constraints. The presence of non-unique or non-singular representations.Q6. What do you mean by Shared Lock & Exclusive lock? Describe briefly twophase locking protocol?A database is the huge collection of data that is stored into it in form of tables.This data is very important for the companies who use those databases as anyloss or misuse of this data can put both the company and customers into trouble.In order to avoid this situation and protect the customers, database developingcompanies provide much security featured with their database products; one ofthem is Locking system to maintain the integrity of database. There are two typesof lock available with database system, these are:
1) Shared Lock: is provided to the readers of the data. These locks enable all theusers to read the concurrent data at the same time, but they are not allowed tochange/ write the data or obtain exclusive lock on the object. It could be set fortable or table row. Lock is released or unlocked at the end of transaction.2) Exclusive Lock: is provided to the writers of the data. When this lock is set on aobject or transaction, it means that only writer, who has set the lock can changethe data, and if other users cannot access the locked object. Lock is released atthe end of change in transaction. Can be set on Tables or rows.Exclusive locksExclusive locks protect updates to file resources, both recoverable and non-recoverable. They can be owned by only one transaction at a time. Anytransaction that requires an exclusive lock must wait if another task currentlyowns an exclusive lock or a shared lock against the requested resource.Shared locksShared locks support read integrity. They ensure that a record is not in theprocess of being updated during a read-only request. Shared locks can also beused to prevent updates of a record between the time that a record is read andthe next syncpoint.A shared lock on a resource can be owned by several tasks at the same time.However, although several tasks can own shared locks, there are somecircumstances in which tasks can be forced to wait for a lock: A request for a shared lock must wait if another task currently owns an exclusive lock on the resource. A request for an exclusive lock must wait if other tasks currently own shared locks on this resource. A new request for a shared lock must wait if another task is waiting for an exclusive lock on a resource that already has a shared lock.
In databases and transaction processing, two-phase locking (2PL) is a concurrencycontrol method that guarantees serializability. It is also the name of theresulting set of database transactionschedules (histories). The protocol utilizeslocks, applied by a transaction to data, which may block (interpreted as signals tostop) other transactions from accessing the same data during the transactionslife.By the 2PL protocol locks are applied and removed in two phases: 1. Expanding phase: locks are acquired and no locks are released. 2. Shrinking phase: locks are released and no locks are acquired.Two types of locks are utilized by the basic protocol: Shared and Exclusive locks.Refinements of the basic protocol may utilize more lock types. Using locks thatblock processes, 2PL may be subject to deadlocks that result from the mutualblocking of two or more transactions.2PL is a superset of strong strict two-phase locking (SS2PL), also calledrigorousness, which has been widely utilized for concurrency control in general-purpose database systems since the 1970s. SS2PL implementations have manyvariants. SS2PL was called strict 2PL but this name usage is not recommendednow. Now strict 2PL (S2PL) is the intersection of strictness and 2PL, which isdifferent from SS2PL. SS2PL is also a special case of commitment ordering, andinherits many of COs useful properties. SS2PL actually comprises only one phase:phase-2 does not exist, and all locks are released only after transaction end. Thusthis useful 2PL type is not two-phased at all.Neither 2PL nor S2PL in their general forms are known to be used in practice. Thus2PL by itself does not seem to have much practical importance, and whenever 2PLor S2PL utilization has been mentioned in the literature, the intention has beenSS2PL. What has made SS2PL so popular (probably the most utilized serializabilitymechanism) is the effective and efficient locking-based combination of twoingredients (the first does not exist in both general 2PL and S2PL; the second doesnot exist in general 2PL): 1. Commitment ordering, which provides both serializability, and effective distributed serializability and global serializability, and
2. Strictness, which provides cascadelessness (ACA, cascade-less recoverability) and (independently) allows efficient database recovery from failure.Additionally SS2PL is easier, with less overhead to implement than both 2PL andS2PL, provides exactly same locking, but sometimes releases locks later. However,practically (though not simplistically theoretically) such later lock release occursonly slightly later, and this apparent disadvantage is insignificant and disappearsnext to the advantages of SS2PL.
Master of Business Administration - MBA Semester IIIMI0034 – Database Management System - 4 CreditsAssignment - Set- 2 (60 Marks)Answer all the QuestionsQ1. Define Data Model & discuss the categories of Data Models? What is thedifference between logical data Independence & Physical Data Independence? A data model is a picture or description which depicts how data is to be arrangedto serve a specific purpose. The data model depicts what that data items arerequired, and how that data must look. However it would be misleading to discussdata models as if there were only one kind of data model, and equally misleadingto discuss them as if they were used for only one purpose. It would also bemisleading to assume that data models were only used in the construction of datafiles.Some data models are schematics which depict the manner in which data recordsare connected or related within a file structure. These are called record orstructural data models. Some data models are used to identify the subjects ofcorporate data processing - these are called entity-relationship data models. Stillanother type of data model is used for analytic purposes to help the analyst tosolidify the semantics associated with critical corporate or business concepts.The record data modelThe record version of the data model is used to assist the implementation teamby providing a series of schematics of the file that will contain the data that mustbe built to support the business processing procedures. When the design teamhas chosen a file management system, or when corporate policy dictates that aspecific data management system, these models may be the only modelsproduced within the context of a design project. If no such choice has been made,they may be produced after first developing a more general, non-DBMS specificentity relationship data model.Early data models
Although the term data modeling has become popular only in recent years, in factmodeling of data has been going on for quite a long time. It is difficult for any ofus to pinpoint exactly when the first data model was constructed because each ofus has a different idea of what a data model is. If we go back to the definition weset forth earlier, then we can say that perhaps the earliest form of data modelingwas practiced by the first persons who created paper forms for collecting largeamounts of similar data. We can see current versions of these forms everywherewe look. Every time we fill out an application, buy something, make a request onusing anything other than a blank piece of paper or stationary, we are using aform of data model.These forms were designed to collect specific kinds of information, in specificformat. The very definition of the word form confirms this.A definition A form is the shape and structure of something as distinguished from its substance. A form is a document with blanks for the insertion of details or information.Almost all businesses and in fact almost all organization use forms of every sort togather and store information.Data Management SystemsUntil the introduction of data management systems (and data base managementsystems) data modeling and data layout were synonymous. With one notableexception data files were collections of identically formatted records. Thatexception was a concept introduced in card records - the multi-format-card set, ormaster detail set. This form of card record layout within a file allowed forrepeating sets of data within specific a larger record concept - the so-called logicalrecord (to distinguish it from the physical record). This form was used mostfrequently when designing files to contain records of orders, where each ordercould have certain data which was common to the whole order (the master) andindividual, repetitive records for each order line item (the details). This method offile design employed record fragmentation rather than record consolidation.
To facilitate processing of these multi-format record files, designers used recordcodes to identify records with different layouts and redundant data to permitthese records to be collected (or tied) together in sequence for processing.Because these files were difficult to process, the layout of these records, and theidentification and placement of the control and redundant identifier data fieldshad to be carefully planned. The planning and coordination associated with thesekinds of files constituted the first instances of data modeling.The concepts associated with these kinds of files were transferred to magneticmedia and expanded by vendors who experimented with the substitution ofphysical record addresses for the redundant data. This use of physical recordaddresses coupled with various techniques for combining records of varyinglengths and formats gave rise to products which allowed for the construction ofcomplex files containing multiple format records tied together in complexpatterns to support business processing requirements.These patterns were relatively difficult to visualize and schematics were devisedto portray them. These schematics were also called data models because theymodeled how the data was to be viewed. Because the schematics were based onthe manner in which the records were physically tied together, and thus logicallyaccessed, rather than how they were physically arranged on the direct accessdevice, they were in reality data file structure models, or data record structuremodels. Over time the qualifications to these names became lost and theybecame simply known as data models.Whereas previously data was collected into large somewhat haphazardlyconstructed records for processing, these new data management systems alloweddata to be separated into smaller, more focused records which could be tiedtogether to form a larger record by the data management system. The thiscapability forced designers to look at data in different ways.Data management modelsThe data management systems (also called data base management systems)introduced several new ways of organizing data. That is they introduced severalnew ways of linking record fragments (or segments) together to form largerrecords for processing. Although many different methods were tried, only three
major methods became popular: the hierarchic method, the network method,and the newest, the relational method.Each of these methods reflected the manner in which the vendor constructed andphysically managed data within the file. The systems designer and theprogrammer had to understand these methods so that they could retrieve andprocess the data in the files. These models depicted the way the record fragmentswere tied to each other and thus the manner in which the chain of pointers hadto be followed to retrieved the fragments in the correct order.Each vendor introduced a structural model to depict how the data was organizedand tied together. These models also depicted what options were chosen to beimplemented by the development team, data record dependencies, data recordoccurrence frequencies, and the sequence in which data records had to beaccessed - also called the navigation sequence.The hierarchic modelThe hierarchic model (figure 7-1) is used to describe those record structures inwhich the various physical records which make up the logical record are tiedtogether in a sequence which looks like an inverted tree. At the top of thestructure is a single record. Beneath that are one or more records each of whichcan occur one or more times. Each of these can in turn have multiple recordsbeneath them. In diagrammatic form the top to bottom set of records looks like ainverted tree or a pyramid of records. To access the set of records associated withthe identifier one started at the top record and followed the pointers from recordto record.
The various records in the lower part of the structure are accessed by firstaccessing the records above them and then following the chain of pointers to therecords at the next lower level. The records at any given level are referred to asthe parent records and the records at the next lower level that are connected toit, or dependent on it are referred to as its children or the child records. There canbe any number of records at any level, and each record can have any number ofchildren. Each occurrence of the structure normally represent the collection ofdata about a single subject. This parent-child repetition can be repeated throughseveral levels.
The data model for this type of structural representation usually depicts eachsegment or record fragment only once and uses lines to show the connectionbetween a parent record and its children. This depiction of record types and linesconnecting them looks like an inverted tree or an organizational hierarchy chart.Each file is said to consist of a number of repetitions of this tree structure.Although the data model depicts all possible records types within a structure, inany given occurrence, record types may or may not be present. Each occurrenceof the structure represents a specific subject occurrence an is identified by aunique identifier in the single, topmost record type (the root record).Designers employing this type of data management system would have todevelop a unique record hierarchy for each data storage subject. A givenapplication may have several different hierarchies, each representing data abouta different subject, associated with it and a company may have several dozendifferent hierarchies of record types as components of its data model. Acharacteristic of this type of model is that each hierarchy is normally treated asseparate and distinct from the other hierarchies, and various hierarchies can bemixed and matched to suit the data needs of the particular application.The network modelThe network data model (figure 7-2) has no implicit hierarchic relationshipbetween the various records, and in many cases no implicit structure at all, withthe records seemingly placed at random. The network model does not make aclear distinction between subjects mingling all record types in an overallschematic. The network model may have many different records containingunique identifiers, each of which acts as an entry point into the record structure.Record types are grouped into sets of two, one or both of which can in turn bepart of another set of two record types. Within a given set, one record type is saidto be the owner record and one is said to be the member record. Access to a setis always accomplished by first locating the specific owner record and thenfollowing the chain of pointers to the member records of the set. The networkcan be traversed or navigated by moving from set to set. Various different datastructures can be constructed by selecting sets of records and excluding others.
Each record type is depicted only once in this type of data model and therelationship between record types is indicated by a line between them. The linejoining the two records contains the name of the set. Within a set a record canhave only one owner, but multiple owner member sets can be constructed usingthe same two record typesThe network model has no explicit hierarchy and no explicit entry point. Whereasthe hierarchic model has several different hierarchies structures, the networkmodel employs a single master network or model, which when completed looks
like a web of records. As new data is required, records are added to the networkand joined to existing sets.The relational modelThe relational model (figure 7-3), unlike the network or the hierarchic models didnot rely on pointers to connect and chose to view individual records in setsregardless of the subject occurrence they were associated with. This is in contrastto the other models which sought to depict the relationships between recordtypes. In the network model records are portrayed as residing in tables with nophysical pointer between these tables. Each table is thus portrayed independentlyfrom each other table. This made the data model itself a model of simplicity, butit in turn made the visualization of all the records associated with a particularsubject somewhat difficult.
Data records were connected using logic and by using that data that wasredundantly stored in each table. Records on a given subject occurrence could beselected from multiple tables by matching the contents of these redundantlystored data fields.The impact of data management systemsThe use of these products to manage data introduced a new set of tasks for thedata analysis personnel. In addition to developing record layouts, they also had
the new task of determining how these records should be structured, or arrangedand joined by pointer structures.Once those decisions were made they had to be conveyed to the members of theimplementation team. The hierarchic and network models were necessarybecause without them the occurrence sequences and the record to recordrelationships designed into the files could not be adequately portrayed. Althoughthe relational "model" design choices also needed to be conveyed to theimplementation team, the relational model was always depicted in much thesame format as standard record layouts, and any other access or navigationrelated information could be conveyed in narrative form.Difference between logical data Independence & Physical Data IndependenceData independence is the type of data transparency that matters for a centralizedDBMS. It refers to the immunity of user applications to make changes in thedefinition and organization of data.Physical data independence deals with hiding the details of the storage structurefrom user applications. The application should not be involved with these issues,since there is no difference in the operation carried out against the data.The data independence and operation independence together gives the featureof data abstraction. There are two levels of data independence.Logical Data Independence:Logical data independence is the ability to modify the conceptual schema withouthaving alteration in external schemas or application programs. Alterations in theconceptual schema may include addition or deletion of fresh entities, attributesor relationships and should be possible without having alteration to existingexternal schemas or having to rewrite application programs.Physical Data Independence:Physical data independence is the ability to modify the inner schema withouthaving alteration to the conceptual schemas or application programs. Alteration
in the internal schema might include.* Using new storage devices.* Using different data structures.* Switching from one access method to another.* Using different file organizations or storage structures.* Modifying indexes.Q2. What is a B+Trees? Describe the structure of both internal and leaf nodes ofa B+Tree?A2.B+-TREEThe B-tree is the classic disk-based data structure for indexing records based onan ordered key set. The B+-tree (sometimes written B+-tree, B+tree, or just B-tree) is a variant of the original B-tree in which all records are stored in the leavesand all leaves are linked sequentially. The B+-tree is used as a (dynamic) indexingmethod in relational database management systems.B+-tree considers all the keys in nodes except the leaves as dummies. All keys areduplicated in the leaves. This has the advantage that is all the leaves are linkedtogether sequentially, the entire tree may be scanned without visiting the highernodes at all.B+-Tree Structure
• A B + -Tree consists of one or more blocks of data, called nodes, linked togetherby pointers. The B + -Tree is a tree structure. The tree has a single node at the top,called the root node. The root node points to two or more blocks , called childnodes. Each child nodes points to further child nodes and so on.• The B + -Tree consists of two types of(1) internal nodes(2) leaf nodes:• Internal nodes point to other nodes in the tree. Leaf nodes point to data in thedatabase using data pointers. Leaf nodes also contain an additional pointer,called the sibling pointer, which is used to improve the efficiency of certain typesof search.• All the nodes in a B + -Tree must be at least half full except the root node whichmay contain a minimum of two entries. The algorithms that allow data to beinserted into and deleted from a B + -Tree guarantee that each node in the treewill be at least half full.• Searching for a value in the B + -Tree always starts at the root node and movesdownwards until it reaches a leaf node.• Both internal and leaf nodes contain key values that are used to guide thesearch for entries in the index.
• The B + -Tree is called a balanced tree because every path from the root node toa leaf node is the same length. A balanced tree means that all searches forindividual values require the same number of nodes to be read from the disc.Internal Nodes • An internal node in a B + -Tree consists of a set of key values and pointers.Theset of keys and values are ordered so that a pointer is followed by a key value.Thelast key value is followed by one pointer.• Each pointer points to nodes containing values that are less than or equalto thevalue of the key immediately to its right.• The last pointer in an internal node is called the infinity pointer. Theinfinitypointer points to a node containing key values that are greater thanthe last keyvalue in the node.• When an internal node is searched for a key value, the search begins at theleftmost key value and moves rightwards along the keys.• If the key value is less than the sought key then the pointer to theleft of the key is known to point to a node containing keys less thanthe sought key.• If the key value is greater than or equal to the sought key then thepointer to the left of the key is known to point to a node containingkeys between the previous key value and the current key value.Leaf Nodes• A leaf node in a B + -Tree consists of a set of key values and data pointers.
Each key value has one data pointer. The key values and data pointers areordered by the key values.• The data pointer points to a record or block in the database that containsthe record identified by the key value. For instance, in the example,above, the pointer attached to key value 7 points to the record identified by thevalue 7.• Searching a leaf node for a key value begins at the leftmost value andmoves rightwards until a matching key is found.• The leaf node also has a pointer to its immediate sibling node in the tree.The sibling node is the node immediately to the right of the current node.Because of the order of keys in the B + -Tree the sibling pointer alwayspoints to a node that has key values that are greater than the key values inthe current node.Order of a B + -Tree• The order of a B + -Tree is the number of keys and pointers that an internalnode can contain. An order size of m means that an internal node cancontainm-1 keys and m pointers.• The order size is important because it determines how large a B + -Tree willbecome.• For example, if the order size is small then fewer keys and pointers can beplaced in one node and so more nodes will be required to store the index.
If the order size is large then more keys and pointers can be placed in anode and so fewer nodes are required to store the index.Searching a B+-Tree Searching a B+-Tree for a key value always starts at the root node anddescends down the tree. A search for a single key value in a B+-Tree consisting ofunique values will always follow one path from the root node to a leaf node.Searching for Key Value 6· Read blockB3 from disc. ~ read the root node· Is B3 a leaf node? No ~ its not a leaf node so the searchcontinues· Is 6 <= 5? No ~ step through each value in B3
· Read block B2. ~ when all else fails follow the infinitypointer· Is B2 a leaf node? No ~ B2 is not a leaf node, continue thesearch· Is 6 <= 7? Yes ~ 6 is less than or equal to 7, follow pointer· Read block L2. ~ read node L2 which is pointed to by 7 inB2· Is L2 a leaf node? Yes ~ L2 is a leaf node· Search L2 for the key value 6. ~ if 6 is in the index it must be in L2Searching for Key Value 5· Read blockB3 from disc. ~ read the root node· Is B3 a leaf node? No ~ its not a leaf node so the searchcontinues· Is 5 <= 5? Yes ~ step through each value in B3· Read blockB1. ~ read node B1 which is pointed to by 5 inB3· Is B1 a leaf node? No ~ B1 is not a leaf node, continue thesearch· Is 5 <= 3? No ~ step through each value in B1· Read blockL3. ~ when all else fails follow the infinitypointer· Is L3 a leaf node? Yes ~ L3 is a leaf node· Search L3 for the key value 5. ~ if 5 is in the index it must be in L3
Inserting in a B+-Tree A B+-Tree consists of two types of node: (i) leaf nodes, which containpointers to data records, and (ii)internal nodes, which contain pointers to otherinternal nodes or leaf nodes. In this example, we assume that the order size1 is 3and that there are a maximum of two keys in each leaf node.Insert sequence : 5, 8, 1, 7, 3, 12, 9, 6Empty Tree The B+-Tree starts as a single leaf node. A leaf node consists of one or moredata pointers and a pointer to its right sibling. This leaf node is empty.Inserting Key Value 5To insert a key search for the location where the key would be expected to occur.In our example the B+-Tree consists of a single leaf node, L1, which is empty.Hence, the key value 5 must be placed in leaf node L1.
Inserting Key Value 8 Again, search for the location where key value 8 is expected to be found.This is in leaf node L1.There is room in L1 so insert the new key.
Inserting Key Value 1 Searching for where the key value 1 should appear also results in L1 but L1 isnow full it contains the maximum two records. L1 must be split into two nodes. The first node will contain the first half of thekeys and the second node will contain the second half of the keys
However, we now require a new root node to point to each of these nodes.We create a new root node and promote the rightmost key from node L1.Each node is half full.Insert Key Value 7 Search for the location where key 7 is expected to be located, that is, L2. Insertkey 7 into L2.
Insert Key Value 3 Search for the location where key 3 is expected to be found results in readingL1. But, L1 is full and must be split.The rightmost key in L1, i.e. 3, must now be promoted up the tree.
L1 was pointed to by key 5 in B1. Therefore, all the key values in B1 to the right ofand including key 5 are moved to the right one place.Insert Key Value 12 Search for the location where key 12 is expected to be found, L2. Try to insert12 into L2. Because L2 is full it must be split.
As before, we must promote the rightmost value of L2 but B1 is full and so itmust be split. Now the tree requires a new root node, so we promote the rightmost value ofB1 into a new node.
The tree is still balanced, that is, all paths from the root node, B3, to a leafnode are of equal length.Insert Key Value 9 Search for the location where key value 9 would be expected to be found, L4.Insert key 9 into L4.
Insert Key Value 6 Key value 6 should be inserted into L2 but it is full. Therefore, split it andpromote the appropriate key value.Leaf block L2 has split and the middle key, 7, has been promoted into B2.
Deleting from a B+-Tree Deleting entries from a B+-Tree may require some redistribution of the keyvalues to guarantee a wellbalanced tree.Deletion sequence: 9, 8, 12.Delete Key Value 9 First, search for the location of key value 9, L4. Delete 9 from L4. L4 is not lessthan half full and the tree is correct.Delete Key Value 8 Search for key value 8, L5. Deleting 8 from L5 causes L5 to underflow, that is, itbecomes less than half full.
We could remove L5 but instead we will attempt to redistribute some ofthe values from L2. This is possible because L2 is full and half its contents can beplaced in L5. As some entries have been removed from L2, its parent B2 must beadjusted to reflect the change.We can do this by removing it from the index and then adjusting the parent nodeB2.
Deleting Key Value 12 Deleting key value 12 from L4 causes L4 to underflow. However, because L5 isalready half full we cannot redistribute keys between the nodes. L4 must bedeleted from the index and B2 adjusted to reflect the change. The tree is still balanced and all nodes are at least half full. However, toguarantee this property it is sometimes necessary to perform a more extensiveredistribution of the data.Search Algorithm s = Key value to be found n = Root node o = Order of B+-Tree WHILE n is not a leaf node i=1
found = FALSE WHILE i <= (o-1) AND NOT found IF s <= nk[i] THEN n = np[i] found = TRUE ELSE i=i+1 END END IF NOT found THEN n = np[i] END ENDInsert Algorithm s = Key value to be inserted Search tree for node n containing key s with path in stack p from root(bottom) to parent of node n(top). IF found THEN
STOPELSE IF n is not full THEN Insert s into n ELSE Insert s in n (* assume n can hold s temporarily *) j = number of keys in n / 2 Split n to give n and n1 Put first j keys from n in n Put remaining keys from n in n1 (k,p) = (nk[j],"pointer to n1") REPEAT IF p is empty THEN Create internal node n2 Put (k,p) in n2finished = TRUE ELSEn = POP pIF n is not full THEN Put (k,p) in nfinished = TRUEELSE
j = number of keys in n / 2 Split n into n and n1 Put first j keys and pointers in n into n Put remaining keys and pointers in n into n1 (k,p) = (nk[j],"pointer to n1") END END UNTIL finished ENDQ3. Describe Projection operation, Set theoretic operation & join operation?Q3. The operation of projection consists in selecting the name of the columns oftable(s) which one wishes to see appearing in the answer. If one wants to displayall the columns "*" should be used. The columns are given after the SELECTclause.-Display the Name and the code sex of the students.SELECT Nometu, CdsexeFROM ETUDIANT;-Display the contents of the table ETUDIANTSELECT *FROM ETUDIANT;Conventional set-theoretic operations are union, intersect, exception, andCartesian product.
Cartesian productThe Cartesian product discussed previously is realized as a comma-separated listof table expressions (tables, views, subqueries) in the FROM clause. In addition,another explicit join operation may be used:SELECTLaptop.model, Product.modelFROM Laptop CROSS JOIN Product;Recall that the Cartesian product combines each row in the first table with eachrow in the second table. The number of the rows in the result set is equal to thenumber of the rows in the first table multiplied by the number of the rows in thesecond table. In the example under consideration, the Laptop table has 5 rowswhile the Product table has 16 rows. As a result, we get 5*16 = 80 rows. Hence,there is no result set of that query here. You may check this assertion executingabove query on the academic database.In the uncombined state, the Cartesian product is hardly used in practice. As arule, it presents an intermediate restriction (horizontal ptojection) operationwhere the WHERE clause is available in the SELECT statement.UnionThe UNION keyword is used for integrating queries:<query 1>UNION [ALL]<query 2>The UNION operator combines the results of two SELECT statements into a singleresult set. If the ALL parameter is given, all the duplicates of the rows returnedare retained; otherwise the result set includes only unique rows. Note that anynumber of queries may be combined. Moreover, the union order can be changedwith parentheses.The following conditions should be observed:
The number of columns of each query must be the same. Result set columns of each query must be compared by the data type to each other (as they follows). The result set uses the column names in the first query. The ORDER BY clause is applied to the union result, so it may only be written at the end of the combined query.Example.Find the model numbers and prices of the PCs and laptops:SELECT model, priceFROM PCUNIONSELECT model, priceFROM LaptopORDER BY price DESC; model price 1750 1200.0 1752 1150.0 1298 1050.0 1233 980.0 1321 970.0 1233 950.0 1121 850.0 1298 700.0 1232 600.0 1233 600.0
1232 400.0 1232 350.0 1260 350.0Example. Find the product type, the model number, and the price of the PCs andlaptops:SELECT Product .type, PC.model, priceFROM PC INNER JOIN Product ON PC.model = Product .modelUNIONSELECT Product .type, Laptop.model, priceFROM Laptop INNER JOIN Product ON Laptop.model = Product .modelORDER BY price DESC; type model price Laptop 1750 1200.0 Laptop 1752 1150.0 Laptop 1298 1050.0 PC 1233 980.0 Laptop 1321 970.0 PC 1233 950.0 PC 1121 850.0 Laptop 1298 700.0 PC 1232 600.0
PC 1233 600.0 PC 1232 400.0 PC 1232 350.0 PC 1260 350.0Intersect and ExceptionThe SQL standard offers SELECT statement clauses for operating with theintersect and exception of queries. These are INTERSECT and EXCEPT clauses,which work as the UNION clause. The result set will include only those rows thatare present in each query (INTERSECT) or only those rows from the first querythat are not present in the second query (EXCEPT).Many of the DBMS do not support these clauses in the SELECT statement. This isalso true for MS SQL Server. There are also other means to be involved whileperforming intersect and exception operations. It should be noted here that thesame result may be reached by differently formulating the SELECT statement. Inthe case of intersection and exception one could use the EXISTS predicate.The EXISTS predicateEXISTS::= [NOT] EXISTS (<table subquery>)The EXISTS predicate evaluates to TRUE providing the subquery contains anyrows, otherwise it evaluates to FALSE. NOT EXISTS works the same as EXISTS beingsatisfied if no rows are returnable by the subquery. This predicate does notevaluate to UNKNOWN.As in our case, the EXISTS predicate is generally used with dependent subqueries.That subquery type has an outer reference to the value in the main query. Thesubquery result may be dependent on this value and must be separatelyevaluated for each row of the query that includes the subquery. Because of this,the EXISTS predicate may have different values for each row of the main query.
Intersection example. Find those laptop makers who also produce printers:SELECT DISTINCT makerFROM Product AS Lap_productWHERE type = Laptop AND EXISTS (SELECT maker FROM Product WHERE type = Printer AND maker = Lap_product.maker);The printer makers are retrieved by the subquery and compared with the makerreturned from the main query. The main query returns the laptop makers. So, foreach laptop maker it is checked that the subquery returns any rows (i.e. thismaker also produces printers). Because the two queries in the WHERE clausemust simultaneously be satisfied (AND), the result set includes only wanted rows.The DISTINCT keyword is used to make sure each maker is in the returned dataonly once. As a result, we get: maker AException example. Find those laptop makers who do not produce printers:SELECT DISTINCT makerFROM Product AS Lap_productWHERE type = Laptop AND NOT EXISTS (SELECT maker FROM Product WHERE type = Printer AND maker = Lap_product.maker);Here, it is sufficient to replace EXIST in the previous example with NOT EXIST. So,the returned data includes only those main query rows, for which the subqueryreturn no rows. As a result we get:
maker B CQ4. Discuss Multi Table Queries?Inner joins (also known as equijoins) are used to contain information from acombination of two or more tables. The join condition determines which recordsare paired together and is specified in the WHERE clause. For example, letscreate a list of driver/vehicle match-ups where both the vehicle and driver arelocated in the same city. The following SQL query will accomplish this task:SELECTlastname, firstname, tagFROM drivers, vehiclesWHERE drivers.location = vehicles.locationAnd lets take a look at the results:lastname firstname tag-------- --------- ---Baker Roland H122JMSmythe Michael D824HASmythe Michael P091YFJacobs Abraham J291QRJacobs Abraham L990MTNotice that the results are exactly what we sought. It is possible to further refinethe query by specifying additional criteria in the WHERE clause. Our vehiclemanagers took a look at the results of our last query and noticed that theprevious query matches drivers to vehicles that they are not authorized to drive
(e.g. truck drivers to cars and vice-versa). We can use the following query toresolve this problem:The current commercial multilevel secure (MLS) database management system(DBMS) products provide extensions to SQL to support multilevel databaseapplications. However, the DBMS vendors have implemented a variety ofmechanisms that are both difficult to understand and ineffective in addressing anumber of application concerns. The paper documents and compares the SQLextensions for Informix Online/Secure, Trusted Oracle, Trusted Rubix, and SybaseSecure SQL server. Based on the vendors current implementations, we havedeveloped recommendations for an MLS SQL standard that would supportinteroperability both among the MLS DBMS products and with standard SQL clientapplications. We have also analyzed the vendors approaches to polyinstantiationand signaling channels; our recommendations include improved support for coverstories and better control of inherent signaling channelsSELECT lastname, firstname, tag, vehicles.classFROM drivers, vehiclesWHERE drivers.location = vehicles.locationAND drivers.class = vehicles.classNotice that in this example we needed to specify the source table for the classattribute in the SELECT clause. This is due to the fact that class is ambiguous – itappears in both tables and we need to specify which table’s column should beincluded in the query results. In this case it does not make a difference as thecolumns are identical and they are joined using an equijoin. However, if thecolumns contained different data this distinction would be critical. Here are theresults of this query:
lastname FirstName Tag Class-------- --------- --- -----Baker Roland H122JM CarSmythe Michael D824HA TruckJacobs Abraham J291QR CarNotice that the rows pairing Michael Smythe to a car and Abraham Jacobs to atruck have been removed.You can also use inner joins to combine data from three or more tables.Outer joins allow database users to include additional information in the queryresults. Well explore them in the next section of this article.Take a moment and review the database tables located on the first page of thisarticle. Notice that we have a driver -- Jack Ryan -- who is located in a city wherethere are no vehicles. Our vehicle managers would like this information to beincluded in their query results to ensure that drivers do not sit idly by waiting for avehicle to arrive. We can use outer joins to include records from one table thathave no corresponding record in the joined table. Lets create a list ofdriver/vehicle pairings that includes records for drivers with no vehicles in theircity. We can use the following query:SELECTlastname, firstname, driver.city, tagFROM drivers, vehiclesWHERE drivers.location = vehicles.location (+)Notice that the outer join operator "(+)" is included in this query. This operator isplaced in the join condition next to the table that is allowed to have NULLvalues. This query would produce the following results: