Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Unit 2  rdbms study_material
Upcoming SlideShare
Unit 3  rdbms study_materials-convertedUnit 3 rdbms study_materials-converted
Loading in ... 3
1 of 25

More Related Content

Unit 2 rdbms study_material

  1. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page1 Pondicherry University Community College Department of Computer Science Course : B.Voc [Software Development] Year : II Semester : III Subject : Relational DataBase Management System Unit II Study Material Prepared by D.GAYA Assistant Professor, Department of Computer Science, Pondicherry University Community College, Lawspet, Puducherry-08.
  2. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page2 Unit-II Database Management Systems – Tree Structures – Plex Structures – Data Description Languages. Relational Databases – Third Normal Form – Canonical Data structures – Varieties of data independences. Introduction Database Management System (DBMS) refers to the technology solution used to optimize and manage the storage and retrieval of data from databases. DBMS offers a systematic approach to manage databases via an interface for users as well as workloads accessing the databases via apps. The management responsibilities for DBMS encompass information within the databases, the processes applied to databases (such as access and modification), and the database’s logic structure. DBMS also facilitates additional administrative operations such as change management, disaster recovery, compliance, and performance monitoring, among others. In order to facilitate these functions, DBMS has the following key components: Software. DBMS is primarily a software system that can be considered as a management console or an interface to interact with and manage databases. The interfacing also spreads across real-world physical systems that contribute data to the backend databases. The OS, networking software, and the hardware infrastructure is involved in creating, accessing, managing, and processing the databases. • Data. DBMS contains operational data, access to database records and metadata as a resource to perform the necessary functionality. The data may include files with such as index files, administrative information, and data dictionaries used to represent data flows, ownership, structure, and relationships to other records or objects. • Procedures. While not a part of the DBMS software, procedures can be considered as instructions on using DBMS. The documented guidelines assist users in designing, modifying, managing, and processing databases. • Database languages. These are components of the DBMS used to access, modify, store, and retrieve data items from databases; specify database schema; control user
  3. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page3 access; and perform other associated database management operations. Types of DBMS languages include Data Definition Language (DDL), Data Manipulation Language (DML), Database Access Language (DAL) and Data Control Language (DCL). • Query processor. As a fundamental component of the DBMS, the query processor acts as an intermediary between users and the DBMS data engine in order to communicate query requests. When users enter an instruction in SQL language, the command is executed from the high-level language instruction to a low-level language that the underlying machine can understand and process to perform the appropriate DBMS functionality. In addition to instruction parsing and translation, the query processor also optimizes queries to ensure fast processing and accurate results. • Runtime database manager. A centralized management component of DBMS that handles functionality associated with runtime data, which is commonly used for context-based database access. This component checks for user authorization to request the query; processes the approved queries; devises an optimal strategy for query execution; supports concurrency so that multiple users can simultaneously work on same databases; and ensures integrity of data recorded into the databases. • Database manager. Unlike the runtime database manager that handles queries and data at runtime, the database manager performs DBMS functionality associated with the data within databases. Database manager allows a set of commands to perform different DBMS operations that include creating, deleting, backup, restoring, cloning, and other database maintenance tasks. The database manager may also be used to update the database with patches from vendors. • Database engine. This is the core software component within the DBMS solution that performs the core functions associated with data storage and retrieval. A database engine is also accessible via APIs that allow users or apps to create, read, write, and delete records in databases. • Reporting. The report generator extracts useful information from DBMS files and displays it in structured format based on defined specifications. This information may be used for further analysis, decision making, or business intelligence.
  4. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page4 Benefits of DBMS DBMS was designed to solve the fundamental problems associated with storing, managing, accessing, securing, and auditing data in traditional file systems. Traditional database applications were developed on top of the databases, which led to challenges such as data redundancy, isolation, integrity constraints, and difficulty managing data access. A layer of abstraction was required between users or apps and the databases at a physical and logical level. Introducing DBMS software to manage databases results in the following benefits: • Data security. DBMS allows organizations to enforce policies that enable compliance and security. The databases are available for appropriate users according to organizational policies. The DBMS system is also responsible to maintain optimum performance of querying operations while ensuring the validity, security and consistency of data items updated to a database. • Data sharing. Fast and efficient collaboration between users. • Data access and auditing. Controlled access to databases. Logging associated access activities allows organizations to audit for security and compliance. • Data integration. Instead of operating island of database resources, a single interface is used to manage databases with logical and physical relationships.
  5. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page5 • Abstraction and independence. Organizations can change the physical schema of database systems without necessitating changes to the logical schema that govern database relationships. As a result, organizations can upgrade storage and scale the infrastructure without impacting database operations. Similarly, changes to the logical schema can be applied without altering the apps and services that access the databases. • Uniform management and administration. A single console interface to perform basic administrative tasks makes the job easier for database admins and IT users. Applications of DBMS Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information. Mostly data represents recordable facts. Data aids in producing information, which is based on facts. For example, if we have data about marks obtained by all students, we can then conclude about toppers and average marks. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information. Following are the important characteristics and applications of DBMS. • ACID Properties DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (normally shortened as ACID). These concepts are applied on transactions, which manipulate data in a database. ACID properties help the database stay healthy in multi-transactional environments and in case of failure. • Multiuser and Concurrent Access DBMS supports multi-user environment and allows them to access and manipulate data in parallel. Though there are restrictions on transactions when users attempt to handle the same data item, but users are always unaware of them. • Multiple views DBMS offers multiple views for different users. A user who is in the Sales department will have a different view of database than a person working in the Production department. This feature enables the users to have a concentrate view of the database according to their requirements.
  6. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page6 • Security Features like multiple views offer security to some extent where users are unable to access data of other users and departments. DBMS offers methods to impose constraints while entering data into the database and retrieving the same at a later stage. DBMS offers many different levels of security features, which enables multiple users to have different views with different features. For example, a user in the Sales department cannot see the data that belongs to the Purchase department. Additionally, it can also be managed how much data of the Sales department should be displayed to the user. Since a DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to break the code. Data Model in DBMS • A model is an abstraction process that represent essential features without including the background details or explanations. It hides superfluous details while highlighting details pertinent to the application at hand. • A data model is a mechanism that provides this abstraction for database applications. Data modelling is used for representing entities of interest and their relationships in the data base. • A data model defines the logical structure of a data base means that how data is connected to each other and how they are processed and stored inside a system. • A number of models for representing data have been developed. As with programming languages, there is no best choice for all applications but the models maintains the integrity of the by enforcing a set of constraints. Data models differ in their method of representing the associations amongst entities and attributes. The main models or approach are: • The Hierarchical Model – Tree Structure • The Network Model – Plex Structure • The Relational Model – Normalised Structure • The ER Model -Conceptual Model Data Model Structure and Constraints – • To define the database structure, Constructs are used
  7. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page7 • Constructs typically include elements (and their data types) as well as group of elements (Example- Entity, Record, Table), and relationships among such groups. • Constraints specify some restriction on valid data; These constraints must be enforced at all times. Data Model Operations – These operations are used for specifying database retrievals and updates by referring to constructs of the data model. The Operations may include basic model operations as well as user defined operations. Basic Model Operations : • Insert • Delete • Update The Hierarchical Model – Tree Structure Hierarchical model is a data model which uses the tree as its basic structure. So, lets define the basics of the tree. Basics of Tree : A tree is a data structure that consists of hierarchy of nodes with a single node, called the root at highest level. A node may have any number of children, but each child node may have only one parent node on which it is dependent. Thus the parent to child relationship in a tree is one to many relationship whereas child to parent relationship in a tree is one to one.
  8. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page8 In figure 1, the node at level 1 is called the root node and the nodes at that has no children are called leaves. For example, node 4, 5, 7, 8, 9, 10 and 11. • Nodes that are children of the same parent are called siblings. For example, nodes 2, 3, 4 are siblings. • For any node there is a single path called the hierarchical path from the root node. The nodes along this path are called that nodes ancestors. • Similarly for a given node, any node along a path from that node to leaf is called its descendent. • For example, suppose we have to find out the hierarchical path of node 10, then it will be 1→2→6→10 and the ancestors of node 10 are 1, 2 and 6. • The height of tree is the number of levels on the longest hierarchical path from the root to a leaf. The above tree has a height= 4. • A tree is said to be balanced if every path from the root node to a leaf has the same length. Figure 2 shows a balanced and an unbalanced tree.
  9. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page9 A binary tree is one in which each node has not more than two children. Figure 3 shows a binary tree Example of Hierarchical Model :
  10. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page10 • Figure 4 shows a data structure diagram for a tree representing the STUDENT, FACULTY and CLASS. • The root node chosen is faculty, CLASS as a child of faculty and STUDENT as a child of class. • The cardinality between CLASS and FACULTY is one to many cardinality as a FACULTY teaches one or more CLASS. • The cardinality between a CLASS and a STUDENT is also one to many cardinality because a CLASS has many STUDENTS. Figure 5 shows an occurrence of the FACULTY-CLASS-STUDENT. Operations on Hierarchical Model • Deletion- If CS02 is deleted, then all the students in CS02 class will be deleted. So deletion is very difficult. However deletion of leaf nodes that is students does not create difficulty in deletion. • Insertion- A new class say, CS03 may not be introduced unless some faculty is available at root level. So insertion is also difficult. • Updation- Suppose a student has changed his subject from Hindi to Sanskrit, then firstly a search is performed to find out Hindi subject and then an update is made. A search is a time consuming process here. So these problem occurs in all the three operations.
  11. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page11 Advantages of Hierarchical Model • Easy to understand • Performance is better than relational data model • Disadvantages of Hierarchical Model • Difficult to access values at lower level • This model may not be flexible to accomodate the dynamic needs of an organisation • Deletion of parent node result in deletion of child node forcefully • Extra space is required for the storage of pointers The Network Model – Plex Structure The network database or network model uses the plex structure as its basic data structure. A network is a directed graph consisting of nodes connected by links or directed arcs. The nodes corresponds to record types and the links to pointers or relationships. All the relationship are hardwired or pre-computed and build into structure of database itself because they are very efficient in space utilization and query execution time. The network data structure looks like a tree structure except that a dependent node which is called a child or member, may have more than one parent or owner node. All figure shows the network model – A diagram called as Bachman Diagram is used to represent a network data structure. The nodes in the network are replaced by rectangles that represent records and links are shown by lines connecting the rectangles.
  12. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page12 A plex structure with two record types is shown / Example of Network Database : Operations on Network Model/Network Database • Insertion- In the above figure, it is clear that a new part or supplier can easily be inserted. • Deletion- For deletion only link is to be removed and no information will be lost. For example, to remove PART 2, we delete the connector line between suppliers. • Updation- Updation is also easy, for example, suppose SUPPLIER B supplies PART 1 in place of SUPPLIER 2, so, updation will be successfully done by changing the link of SUPPLIER B from PART 2 to PART 1. Advantages of Network Model/ Network Database : • Easy access to data. • Flexible • Efficient • This model can be applied to real world problems, that require routine transactions. Disadvantages of Network Model/ Network Database : • Complex to design and develop.
  13. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page13 • Extra memory is required for storage of pointers • Performance is infexible and difficult to use. • Operation and maintenance are time consuming and expensive for large databases. The Relational Model – Normalised Structure The relational model is a lower level model. It is based on the concept of a relation, which is physically represented as a table. A table is a collection of rows & columns. The relational model uses a collection of tables to represent both data and the relationships among those data. The tables are used to hold information about the objects to be represented in the database. A relation or a table is represented as a two dimensional form in which the rows of the table corresponds to individual records and the columns corresponds to attributes. Each row is called a tuple and each column is called an attribute.For example, a student relation is represented by the STUDENT table having columns for attributes SID, NAME and BRANCH. SID : Key Number of Records = Cardinality Number of Fields = Arity Student (SID,Name,Branch) = Relational Schema (Table Abstraction) • The SID here is the primary key as it identifies a student record or tuple uniquely.(A primary key is the key applied on an attribute(SID) which recognize a tuple. • The Cardinality of the Relation or table is defined as the number of records in the STUDENT relation which is 4. • The Arity is defined as the number of fields or columns in the relation. Domain of an Attribute – Domain of an attribute is the set of allowable values for that attribute. It is a pool of values from which the actual values appearing in a given column are drawn. For example, the values appearing in the SID column are drawn from the domain of all SID. Domains may be distinct, or two or more attributes may have same domain.
  14. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page14 Operations in Relational Model – • Insertion – A new student record can be easily inserted in the table. • Deletion – An existing student record or tuple can easily be deleted from the STUDENT relation. • Updation – An existing student record can be update easily. For example, if a student S2 changes its BRANCH from CS to IT, then it can easily be changed Advantages of Relational Model – • Easy to use an understand • Very flexible. • Widely used. • Provides excellent support for adhoc queries. • Users need not consider issues such as storage structure and access strategy. • Specify control and authorization can be implemented more easily. • Data independence is achieved more easily with normalisation structure used in a relational database. Disadvantages of Relational Model – • For large databases, the performance in responding to queries is definitely degraded. • The processing requirements need to construct the indexes. So, the index position of the file must be created and maintained along with the file records themselves. • The file index must be searched sequentially before the actual file records are obtained. This wastes time. Data Description Languages There are two main types of SQL statements that are executed within databases as described in SQL. Before you can manipulate data residing in a database using SQL Data Manipulation Language (DML), you have to create the logical structure to store information. Data Definition Language (DDL) is the portion of SQL that deals with how data should reside in the database at a logical level. Each database has its own set of object types
  15. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page15 that it allows. Most include tables, indexes, views, store procedures, functions, synonyms, and triggers. Each database has its own syntax for DDL statements and the clauses that can be included. There are some basic key words that you will find in almost every RDBMS. • CREATE • ALTER • DROP • TRUNCATE The CREATE Statement The basic building blocks of the Relational Database Management System are tables. I envision a table as a set of rows and columns. The columns represent fields of information. The rows represent records in the table. In following graphic, the persons table has four fields and four records. You could simply create the table with the following statement: CREATE TABLE books ( book_id VARCHAR(100), book_name VARCHAR(100), author_id NUMBER, editor_id NUMBER);
  16. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page16 sql The problem with this table definition is that it allows rows to be created without concern for if the data makes any sense. Envision our table looking like this: Your database design should make sure that data inserted into a table is sensible. Let us create a second table called the "persons" table. This time we will add constraints to make sure that data entered into the table will make sense. It makes sense that each entry in the table will be unique person so we give it a PRIMARY KEY. We will also want to make sure to track when the table was last updated and who updated it by making those fields NOT NULL. CREATE TABLE persons ( person_id NUMBER NOT NULL PRIMARY KEY, name VARCHAR(100), birth_date DATE, gender VARCHAR(30), last_update DATE NOT NULL, updated_by NUMBER NOT NULL );
  17. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page17 The ALTER Statement The definition of an object in a database can be changed using the ALTER statement. Example: Add constraints to the "books" table to assure the fields "book_name" and "author_id" contain data. ALTER TABLE books MODIFY (book_name NOT NULL); ALTER TABLE books MODIFY (author_id NOT NULL); sql A FOREIGN KEY constraint can be added to the fields "author_id" and "the editor_id" limiting the available values to ones that currently exist in the persons table in the "person_id" field. ALTER TABLE books ADD CONSTRAINT fk_author FOREIGN KEY (author_id) REFERENCES persons (person_id); ALTER TABLE books ADD CONSTRAINT fk_editor FOREIGN KEY (editor_id) REFERENCES persons (person_id); sql What if we wanted to add a publication date to our books table? Use the 'ALTER' statement to add the field. ALTER TABLE books ADD ( publish_date DATE); sql You can alter more than just tables. Here are examples of some other ALTER statements. ALTER ROLE book_reader IDENTIFIED BY r2Xe135DEw; ALTER INDEX editor_indx DISABLE; ALTER TRIGGER persons_update RENAME TO persons_trig;
  18. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page18 sql The TRUNCATE Statement TRUNCATE TABLE books; sql The TRUNCATE statement removes all the data from a table. This is very similar to DML statement. DELETE FROM books; sql In the Oracle database, there is a difference between the two. TRUNCATE removes all data where a DELETE can be specific in the rows it wants to delete. Also, if you make a mistake with a DELETE statement you can use the transactional control statement ROLLBACK to remove the changes. The TRUNCATE command has no rollback capability. The biggest positive to using the TRUNCATE statement is that it can be faster than the DELETE statement, especially if the table has numerous rows, triggers, indexes, and other dependencies. The DROP Statement Removing an object from the database accomplished with the DROP statement. DROP TABLE books; DROP TABLE persons; sql When you drop a table it removes all the rows, invalidates dependent objects, removes indexes, constraints and privileges that anyone had on that table. Just as with the CREATE and ALTER statements, there are other DROP statement types.
  19. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page19 3rd Normal Form Definition A database is in third normal form if it satisfies the following conditions: • It is in second normal form • There is no transitive functional dependency By transitive functional dependency, we mean we have the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B. 3rd Normal Form Example Consider the following example: In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional dependency, and this structure does not satisfy third normal form. To bring this table to third normal form, we split the table into two as follows:
  20. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page20 Now all non-key attributes are fully functional dependent only on the primary key. In [TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book ID]. In [TABLE_GENRE], [Genre Type] is only dependent on [Genre ID]. Canonical Data structures • A canonical data model (CDM) is a type of data model that presents data entities and relationships in the simplest possible form. • It is generally used in system/database integration processes where data is exchanged between different systems, regardless of the technology used. • A canonical data model is also known as a common data model. A canonical data model primarily enables an organization to create and distribute a common definition of its entire data unit. The design of a CDM requires identifying all entities, their attributes and the relationships between them. The importance of a CDM is particularly evident in integration processes where data units are shared between different information system platforms. It utilizes a generalized data format to present/define data that makes it simple to share data among multiple applications. Varieties of data independences. Data Independence is defined as a property of DBMS that helps you to change the Database schema at one level of a database system without requiring to change the schema at the next higher level. Data independence helps you to keep data separated from all programs that make use of it. You can use this stored data for computing and presentation. In many systems, data independence is an essential function for components of the system. Importance of Data Independence • Helps you to improve the quality of the data • Database system maintenance becomes affordable • Enforcement of standards and improvement in database security • You don't need to alter data structure in application programs
  21. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page21 • Permit developers to focus on the general structure of the Database rather than worrying about the internal implementation • It allows you to improve state which is undamaged or undivided • Database incongruity is vastly reduced. • Easily make modifications in the physical level is needed to improve the performance of the system. Types of Data Independence In DBMS there are two types of data independence • Physical data independence • Logical data independence. Levels of Database The database has 3 levels as shown in the diagram below • Physical/Internal • Conceptual • External
  22. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page22 Type of Schema Implementation External Schema View 1: Course info(cid:int,cname:string) View 2: studeninfo(id:int. name:string) Conceptual Shema Students(id: int, name: string, login: string, age: integer) Courses(id: int, cname.string, credits:integer) Enrolled(id: int, grade:string) Physical Schema Relations stored as unordered files. Index on the first column of Students.
  23. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page23 Consider an Example of a University Database. At the different levels this is how the implementation will look like: Physical Data Independence Physical data independence helps you to separate conceptual levels from the internal/physical levels. It allows you to provide a logical description of the database without the need to specify physical structures. Compared to Logical Independence, it is easy to achieve physical data independence. With Physical independence, you can easily change the physical storage structures or devices with an effect on the conceptual schema. Any change done would be absorbed by the mapping between the conceptual and internal levels. Physical data independence is achieved by the presence of the internal level of the database and then the transformation from the conceptual level of the database to the internal level. Examples of changes under Physical Data Independence • Due to Physical independence, any of the below change will not affect the conceptual layer. • Using a new storage device like Hard Drive or Magnetic Tapes • Modifying the file organization technique in the Database • Switching to different data structures. • Changing the access method. • Modifying indexes. • Changes to compression techniques or hashing algorithms. • Change of Location of Database from say C drive to D Drive Logical Data Independence Logical Data Independence is the ability to change the conceptual scheme without changing • External views • External API or programs
  24. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page24 Any change made will be absorbed by the mapping between external and conceptual levels. When compared to Physical Data independence, it is challenging to achieve logical data independence. Examples of changes under Logical Data Independence • Due to Logical independence, any of the below change will not affect the external layer. • Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite of existing application programs • Merging two records into one • Breaking an existing record into two or more records Difference between Physical and Logical Data Independence Logica Data Independence Physical Data Independence Logical Data Independence is mainly concerned with the structure or changing the data definition. Mainly concerned with the storage of the data. It is difficult as the retrieving of data is mainly dependent on the logical structure of data. It is easy to retrieve. Compared to Logic Physical independence it is difficult to achieve logical data independence. Compared to Logical Independence it is easy to achieve physical data independence. You need to make changes in the Application program if new fields are added or deleted from the database. A change in the physical level usually does not need change at the Application program level. Modification at the logical levels is significant whenever the logical structures of the database are changed. Modifications made at the internal levels may or may not be needed to improve the performance of the structure.
  25. D.GAYA, Assistant Professor, Department of Computer Science, PUCC. Page25 Concerned with conceptual schema Concerned with internal schema Example: Add/Modify/Delete a new attribute Example: change in compression techniques, hashing algorithms, storage devices, etc