Data administration


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data administration

  1. 1. CHAPTER 6: DATA ADMINISTRATIONChapter Objectives At the end of this chapter, you should be able to:  define data administration, database administration, locking, versioning, deadlock, transaction;  define the difference between data administration and database administration;  describe the function of a DBMS and its major components;  describe the optimistic and pessimistic systems of concurrency control;  describe the problem of database security and the techniques to enhance security;  describe the problem of database recovery and the facilities to recover database.Essential Reading Modem Database Management (4th Edition), Fred R. McFadden & Jeffrey A. Hoffer (1994), Benjamin/Cummings. [Chapter 12, page 425 - 458] Fundamentals of Database Systems, Ramez Elmasri & Shamkant B.Narathe (1989), Benjamin/Cummings. Practical Database Techniques, S. Misbah Deen.Useful Websites to learn Database and Programming: Erwin M. Globio, MSIT 6-1
  2. 2. DB212 CHAPTER 6: DATA ADMINISTRATION6.1 Data and Database Administrator 6.1.1 Introduction There are many causes of poor data utilization:  Multiple definitions of the same data entity and inconsistent representations of the same data elements in separate database, which makes linking data across different.  Missing key data elements, which makes existing data useless.  Low levels of data quality due to inappropriate sources of data or timing of data transfers from one system to another.  Not knowing what data exist, where to find them, and what they really mean. Therefore, the data administration function is essential to the success of managing the data resource. 6.1.2 Data Administration A high-level function that is responsible for the overall management for the overall management of data resources in an organization, including maintaining corporate-wie definitions and standards. 6.1.3 Database Administration A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery. 6.1.4 Functions of Data and Database Administration There are 6 stages in the life cycle of a typical database system:  Database planning This develops a strategic plan fro database development that supports the overall organizational business plan. This is usually is the responsibility of top management.  Database analysis The process of analysis is concerned with identifying data entities currently used by the organization and their relationships.  Database design This develops a strategic plan for database development that supports the overall organization business plan. This usually is the responsibility of top management.6-2 Prof. Erwin M. Globio, MSIT
  3. 3. DB212 CHAPTER 6: DATA ADMINISTRATION  Operation and maintenance This is a process to update the database to keep it current.  Growth and change Data administrators must plan for charge, such as adding new record types, accommodating growth. They must monitor the performance of the database and take corrective actions whenever necessary. The manner in which these functions are performed varies from one organization to the next and is influenced by the use of specific methodologies and CASE tools.6.2 DBMS A DBMS is a software application system that is used to create, maintain, and to provide controlled access to user databases. 6.2.1 Components of a DBMS  DBMS Engine This is the central components of a DBMS which provides access to the repository and the database and coordinates all of the other functional elements of the DBMS.  Interface subsystem The interface subsystem provides facilities for users and applications to access the various components of the DBMS. Most DBMS products provide a range of languages and other interfaces. The system is used by programmers and by users with little or no programming experience.For examples:  A data definitions languages (DDL) which is used to define database structures such as records, tables, files and views.  An interactive query language (such as SQL), which is used to display data extracted from the database and to perform simple updates.  A graphic interface (such as Query-by example).  A DBMS programming language (such as dBASE IV command language or Access Basic).  An interface to standard third-generation programming languages such as BASIC and COBOL.  Information Repository Dictionary Subsystem This is also known as the Data Dictionary which is used to manage and control access to the repository.Prof. Erwin M. Globio, MSIT 6-3
  4. 4. DB212 CHAPTER 6: DATA ADMINISTRATION  Performance Management Subsystem This provides facilities to optimize DBMS performance. Two of its important functions follow:  Query optimization: Structuring SQL queries to minimize response time.  DBMS reorganization: Maintaining statistics on database usage and taking actions such as database reorganization, creating indexes.  Backup and Recovery SubsystemThis subsystem provides facilities for logging transactions and database changes, periodically making backup copies of the database, and recovering the database in the event of some type of failure.  Application Development SubsystemThis subsystem that provides facilities that allow end users and programmers to develop complete database applications.  Security Management SubsystemThis subsystem provides facilities to protect and control access to the database and repository.6.3 Concurrency Control This concerned with preventing loss of data integrity due to interference between users in a multi-user environment. 6.3.1 Single-user versus Multi-user Systems One criterion for classifying a database system is by the number of users who can use the system concurrently. A DBMS is single-user if at most one user at a time can use the system and is multi-user if many users can use the system concurrently. In a multi-user DBMS, the stored data items are the primary resources that may be accessed concurrently by user programs, which are constantly retrieving and modifying the database. The execution of a program that accesses or changes the contents of the database is called a transaction. The transactions submitted by the various users may execute concurrently and may access and update the same database records. If this concurrent execution is controlled, it may lead to problems such as an inconsistent database.6-4 Prof. Erwin M. Globio, MSIT
  5. 5. DB212 CHAPTER 6: DATA ADMINISTRATION 6.3.2 Why Concurrency Control is Needed?  Problems  The lost update problem Consider the situation illustrated in diagram below. That figure is intended to be read as follow: Transaction A Time Transaction B ---------------------------- ------------------------- 1.Read account balance ------------------------- (Balance = $1,000) t1 --------------------------- 1.Read account balance (Balance = $1,000) 2.Update record t2 ------------------------- (withdraw $200 and the balance is $800) 2.Update record --------------------------- t3 (withdraw $300 and the balance is $700) t4 -------------------------- ERROR! Transaction A retrieve some record R at time t1; Transaction B retrieves that same record R at the t2; Transaction B updates the same record at time t4. Thus transaction As update is lost at time t4, because transaction B overwrites without even looking at it. This means that the effect of Bs update has been lost due to interference between the transactions.  The temporary update problem This occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value. For example, TI updates item X then fails before completion, so the system must change X back to its original value. Before it does so, transaction T2 reads the "temporary" value of X, which will not be recorded permanently in the database because of the failure of T1. Transaction 1 (T1) Transaction 2 (T2) Read item (X) X=X–N Write item (X) Read-item (X) X=X+M Write-item (X) read-item transaction T1 fails and must change the value of X back to its old value; but meanwhile, T2 as read the “temporary” incorrect value of XProf. Erwin M. Globio, MSIT 6-5
  6. 6. DB212 CHAPTER 6: DATA ADMINISTRATION  Inconsistent Analysis Problem Another problem is when one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records. The aggregate function may calculate some values before they are updated and others after they are updated. For example, suppose a transaction T3 is calculating the total number of reservations an all the flights, meanwhile, transaction T1 is executing. If the interleaving of operations shown in figure below occurs, the result of T3 will be off by an amount N because T3 reads the value of X after N seats are subtracted from it and reads the value of Y before those N seats are added to it. Transaction 1 (T1) Transaction 2 (T2) Sum = 0 Read-item (A) Sum = Sum + A Read-item (X) X = X-N Write-item (X) Read-item (X) Sum = sum + X Read-item (Y) Sum = sum + Y Read-item (Y) Y = Y+N Write-item (Y) 6.3.3 Basic Approaches to Concurrency Control In short, concurrency control is concerned with preventing loss of data integrity due to interference between users in a multi-user environment. There are two basic approaches to concurrency control : a pessimistic approach and an optimistic approach.  Locking (Pessimistic Approach) Locking mechanisms are the most common type of concurrency control mechanism. With looking, any data that is retrieved by a user for updating must be locked, or denied to other user, until the update is completed. Locking data is most like checking a cook out of the library. It is unavailable to other until it is returned by the borrower. There are many types of lock. The following is a different type/example of lock:  Shared locks Shared locks (also called S locks, or read locks) allow other transaction to read (but not update) a record (or other resource). A transaction should place a shared lock on a record when it will only read (but not update) that record. With a shared lock, it prevents another user from placing an exclusive lock on that record.6-6 Prof. Erwin M. Globio, MSIT
  7. 7. DB212 CHAPTER 6: DATA ADMINISTRATION  Exclusive locks Exclusive locks (also called X locks, or write locks) prevent another transaction from reading (and therefore updating) a record until it is unlocked. A transaction should place an exclusive lock on a record when it is about to update that record. With an exclusive lock, it prevents another user from placing any type of on that record. Shared Lock(S lock) Exclusive Lock (X lock) Shared Lock True False Exclusive Lock False False  Deadlock Locking (say at the record level) solves the problem of erroneous updates but may lead to another, called deadlock. This may result when two (or more) transaction have locked a common resource and each must wait for the other to unlocks the resource. For example, user A has locked record X and user B has locked record Y. User A then requests record Y and user B requests record X. Both requests are denied, since the requested records are already locked. Thus, unless the DBMS intervenes, both users will wait indefinitely. User A Time User B ---------------------------- ------------------------ t1 ------------------------ 1. Lock record X t2 1.Lock record Y --------------------------------- -------------------------- 2. Request record Y t3 2. Requesr record X : t4 : -------------------------------- (Wait for X) (Wait for Y)  Managing deadlock There are two basis ways to resolve deadlocks : - Deadlock prevention When deadlock prevention is employed, user programs must lock all records they will required at the beginning of a transaction (rather than one at a time). - Deadlock resolution This allows deadlocks to occur but build mechanisms into the DBMS for deteching and breaking the deadlocks.Prof. Erwin M. Globio, MSIT 6-7
  8. 8. DB212 CHAPTER 6: DATA ADMINISTRATION  Optimistic approach (Versioning) This approach that most of the time other users do not want the same record, or it they do, they only want to read the record. With versioning, there is no form of locking. Each transaction is treated as a view of the database as when the transaction starts. When transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record. If there is no conflict, this user s changes are used to update the central database. However, suppose there is a conflict such as two users have made conflicting changes to their private copy of the database. Then, changes made by one of the users are committed to the database.(Committed means after "successful" completion). The other user must be told that there was a conflict and his work cannot be incorporated into the central database. This update will be repeated again later. The main advantage of versioning over locking is performance improvement as read-only transactions can run concurrently with updating transaction. User A reads the record containing the account balance, successfully withdraws $200 and the new balance $800 is posted the account with a COMMIT statement. Meanwhile, user B has also read the account record and requested a withdrawal. This is posted to her local version of the account record. Therefore, when the transaction attempts to COMMIT, it discovers the update conflict and her transaction is aborted. The transaction can be restarted later with the correct balance of $800. 6.3.4 Why Recovery Is Needed? Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (a) all operations in the transaction are completed successfully and their effect is recorded permanently in the database or (b) the transaction has no effect on the database or any other transactions. The DBMS must not permit to let some operations of a transaction T be applied to the database while other operations of T are not. However, this can happen if a transaction fails after executing some of its operations by before executing all of them.  Types of Failures There are several possible reasons for a transaction to fail in the middle of execution. For example :  Computer failure (system crush) : A hardware or software error occurs in the computer system during transaction execution. If the hardware crashes, the contents of the computer internal memory may be lost.  A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero.  Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or write operation of the transaction.  Physical problems and catastrophes:This is an endless list that includes power or air conditioning failure,fire,theft sabotage,overwriting disks or tapes by mistake etc.6-8 Prof. Erwin M. Globio, MSIT
  9. 9. DB212 CHAPTER 6: DATA ADMINISTRATION6.4 Database Recovery Database recovery means restoring a database quickly and accurately after loss and damage. The basic recovery facilities includes :  Backup facility, which provide periodic backup copies of the entire database. The copy should be stored in a secured location where it is protected from loss or damaged.  Journalizing facilities, which maintain an audit of transactions and database changes. There are transaction log and database change log. Transaction log contains a record of the essential data for each transaction that is processed against the database. Database change log contains before- and after- images of records that have been modified by transactions. Database Management System Database Transaction Database Change (Current) log log Database (backup)  A checkpoint facility is when the DBMS periodically suspends all processing and synchronizes its files and journals. Checkpoints should be taken frequently (say, several times an hour). When failures do occurs, it is often possible to resume processing from the most recent checkpoint. Thus, only a few minutes of processing work must be repeated. Consider the following example which shows the possible timings of transactions in relation to the time of the crash and the time of the last checkpoint. T1 T2 T3Prof. Erwin M. Globio, MSIT 6-9
  10. 10. DB212 CHAPTER 6: DATA ADMINISTRATION T4 T5 Time of last checkpoint Time of crash  Transaction T1 was completed before the last checkpoint, so it will not be listed in the checkpoint log record and will have no records in the log subsequent to the last checkpoint.  Transaction T2 was currently active at the time of the last checkpoint so it will also have a COMMIT or ABORT log record in the log file subsequent to the last checkpoint.  Transaction T3 is also listed in the checkpoint record, but it has not completed by the time of the failure, so it has no COMMIT or ABORT record in the log.  Transaction T4 was executed fully between the time of the last checkpoint and the crash, so it has both a BEGIN TRANSACTION and a COMMIT or ABORT record in the log, subsequent to the last check-point record.  Transaction T5 was was begun after the checkpoint, but not completed. It therefore has a BEGIN TRANSACTION, but no COMMIT or ABORT record, in the log subsequent to the last checkpoint. Therefore, at the time of crash, transaction T3 and T5 effects have to be undone, since they are incomplete transaction. Transactions of type T1 has no problems, since they are known to have completed and their updates are known to have been consolidated on the databases at the time of the last checkpoint. Transaction of type T2 and T4 normally present no problem but it is not known whether all the necessary updates have been carried out on the database (some changed pages may still be in the buffers and consequently been lost). Thus the system will have to check whether a complete updates are done. If not, all the updates are undone, else if completed (commit), all updates are redone. In short, this means redoing the effects of a transaction which had committed before the crash, but after the last checkpoint; as well as undoing the effects of the incomplete transactions at the point of crash.  A recovery manager, allows the DBMS to restore the database to a correct condition and restart processing transactions.6 - 10 Prof. Erwin M. Globio, MSIT
  11. 11. DB212 CHAPTER 6: DATA ADMINISTRATION6.4 Database Security Database security is defined as protection of the database against accidental or intentional loss, destruction or misuse. Data administration uses several facilities provided by data management software in carrying out these functions. These include:  Views or subschemas, which help to restrict user views of the database. For example: CREATE VIEW ITEM-ORDER AS SELECT ITEM-NAME, ORDER-NO FROM ITEM, ORDER WHERE ITEM.ORDER-NO = ORDER.ORDER-NO;  Authorization rules, which identify users and restrict the actions they may take against the database. For example, using of password.  User-defined procedures, which defines additional constraints or limitations in using the database. For example, user implements their password logging in their own PC.  Encryption procedures, which encodes data in an unrecognizable form. For example, in the electronic funds transfer systems. The encryption procedures should also include decoding facility.  Authentication schemas, which positively identify a person attempting to gain access to a database.Prof. Erwin M. Globio, MSIT 6 - 11
  12. 12. DB212 CHAPTER 6: DATA ADMINISTRATION6.5 Review Questions 1. Contrast the following terms: a. data administration vs database administration b. deadlock prevention vs deadlock resolution c. optimistic concurrency control vs pessimistic concurrency control d. shared locks vs exclusive locks 2. Describe the DBMS facilities that are required for database backup and recovery. 3. For each of the situations describe below, indicate which of the following security measures is most important appropriate: i. authorization rules ii. encryption iii. authentication schemes a. A national brokerage firm uses a simple password system to protect its database but finds it needs a more comprehensive system to grant different privileges (such as read versus create or update) to different users. b. A manufacturing firm uses a simple password system to protect its database but finds it needs a more comprehensive system to grant different privileges (such as read versus create or update) to different users. c. A university has experienced considerable difficulty with unauthorized users who access files and databases by appropriating passwords from legitimated users.6 - 12 Prof. Erwin M. Globio, MSIT
  13. 13. DB212 CHAPTER 6: DATA ADMINISTRATION Prof. Erwin M. Globio, MSIT Senior IT Trainer Mobile Numbers: 09393741359 or 09323956678 Email Add: Skype Id: erwinglobioProf. Erwin M. Globio, MSIT 6 - 13