SlideShare a Scribd company logo
1 of 93
SectionD
Online transaction processing 
Online transaction processing, or OLTP, is a class of information systems that facilitate 
and manage transaction-oriented applications, typically for data entry and 
retrieval transaction processing. The term is somewhat ambiguous; some understand 
a "transaction" in the context of computer or database transactions, while others 
(such as the Transaction Processing Performance Council) define it in terms of 
business or commercial transactions . OLTP has also been used to refer to processing 
in which the system responds immediately to user requests. An automated teller 
machine (ATM) for a bank is an example of a commercial transaction processing 
application. 
Benefits 
Online Transaction Processing has two key benefits: 
 Simplicity and efficiency. 
 Reduced paper trails 
 The faster, more accurate forecasts for revenues and expenses are both examples 
of how OLTP makes things simpler for businesses.
Online transaction processing Requirements 
 
Online transaction processing increasingly requires support for transactions that span a 
network and may include more than one company. For this reason, new online 
transaction processing software uses client or server processing and brokering 
software that allows transactions to run on different computer platforms in a network. 
 In large applications, efficient OLTP may depend on sophisticated transaction 
management software (such as CICS) and/or database optimization tactics to facilitate 
the processing of large numbers of concurrent updates to an OLTP-oriented database 
 For even more demanding decentralized database systems, OLTP brokering programs 
can distribute transaction processing among multiple computers on a network. OLTP is 
often integrated into service-oriented architecture (SOA) andWeb services. 
 Online Transaction Processing (OLTP) involves gathering input information, processing 
the information and updating existing information to reflect the gathered and 
processed information. As of today, most organizations use a database management 
system to support OLTP. OLTP is carried in a client server system.
Online analytical processing 
 In computing, online analytical processing, or OLAP , is an approach to 
answering multi-dimensional analytical (MDA) queries swiftly. 
 OLAP is part of the broader category of business intelligence, which also 
encompasses relational database, report writing and data mining. 
 Typical applications of OLAP include business reporting for sales, marketing, 
management reporting, business process management (BPM), budgeting and 
forecasting , financial reporting and similar areas, with new applications coming 
up, such as agriculture. 
The term OLAP was created as a slight modification of the traditional database 
term OLTP (Online Transaction Processing)
Online analytical processing 
 OLAP tools enable users to analyze multidimensional data interactively from multiple 
perspectives. 
 OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, 
and slicing and dicing. 
 Consolidation involves the aggregation of data that can be accumulated and computed 
in one or more dimensions. 
For example, all sales offices are rolled up to the sales department or sales division to 
anticipate sales trends. By contrast, the drill-down is a technique that allows users to 
navigate through the details. For instance, users can view the sales by individual products 
that make up a region’s sales. Slicing and dicing is a feature whereby users can take out 
(slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different 
viewpoints.
Online analytical processing 
• Databases configured for OLAP use a multidimensional data model, allowing for complex 
analytical and ad hoc queries with a rapid execution time . They borrow aspects 
of navigational databases, hierarchical databases and relational databases. 
OverviewofOLAPsystems 
The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or 
a hypercube). It consists of numeric facts called measures which are categorized by dimensions. 
The measures are placed at the intersections of the hypercube, which is spanned by the 
dimensions as a Vector space. The usual interface to manipulate an OLAP cube is a matrix 
interface like Pivot tables in a spreadsheet program, which performs projection operations along 
the dimensions, such as aggregation or averaging. 
• The cube metadata is typically created from a star schema or snowflake schema or fact 
constellation of tables in a relational database. Measures are derived from the records in the fact 
table and dimensions are derived from the dimension tables. 
• Each measure can be thought of as having a set of labels, or meta-data associated with it. 
A dimension is what describes these labels; it provides information about the measure. 
• A simple example would be a cube that contains a store's sales as a measure, and Date/Time as 
a dimension. Each Sale has a Date/Time label that describes more about that sale. 
• Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by 
adding a foreign key column to the fact table. This allows an analyst to view the measures along any 
combination of the dimensions.
• For example:
Transaction processing system 
• Transaction processing is a style of computing that divides work into individual, indivisible 
operations, called transactions. A transaction processing system (TPS) or transaction server is a 
software system, or software/hardware combination, that supports transaction processing. 
Processingtypes 
Transaction processing is distinct from other computer processing models — batch processing, time-sharing, 
and real-time processing 
Batchprocessing 
Batch processing is execution of a series of programs (jobs) on a computer without manual 
intervention. Several transactions, called a batch are collected and processed at the same time. The 
results of each transaction are not immediately available when the transaction is being entered ; 
there is a time delay. 
Real-timeprocessing 
Real time systems attempt to guarantee an appropriate response to a stimulus or request quickly 
enough to affect the conditions that caused the stimulus." Each transaction in real-time processing is 
unique; it is not part of a group of transactions.
Transaction processing system 
Time-sharing 
Time sharing is the sharing of a computer system among multiple users, usually giving each 
user the illusion that they have exclusive control of the system. The users may be working on 
the same project or different projects, but there are usually few restrictions on the type of 
work each user is doing. 
Transactionprocessing 
Transaction processing systems also attempt to provide predictable response times to 
requests, although this is not as critical as for real-time systems. Rather than allowing the 
user to run arbitrary programs as time-sharing, transaction processing allows only 
predefined, structured transactions. Each transaction is usually short duration and the 
processing activity for each transaction is programmed in advance.
Transaction processing system features 
The following features are considered important in evaluating transaction processing systems 
• Performance 
Fast performance with a rapid response time is critical. Transaction processing systems are usually 
measured by the number of transactions they can process in a given period of time. 
• Continuous availability 
The system must be available during the time period when the users are entering transactions. Many 
organizations rely heavily on their TPS; a breakdown will disrupt operations or even stop the business. 
• Data integrity 
The system must be able to handle hardware or software problems without corrupting data. Multiple 
users must be protected from attempting to change the same piece of data at the same time, for 
example two operators cannot sell the same seat on an airplane. 
• Ease of use 
Often users of transaction processing systems are casual users. The system should be simple for them 
to understand, protect them from data-entry errors as much as possible, and allow them to easily 
correct their errors. 
• Modular growth 
The system should be capable of growth at incremental costs, rather than requiring a complete 
replacement. It should be possible to add, replace, or update hardware and software components 
without shutting down the system.
Concurrency: Definition 
 Concurrency is the ability of a database to allow multiple users to affect multiple 
transactions. This is one of the main properties that separates a database from other 
forms of data storage like spreadsheets. 
 The ability to offer concurrency is unique to databases. Spreadsheets or other flat file 
means of storage are often compared to databases, but they differ in this one important 
regard. Spreadsheets cannot offer several users the ability to view and work on the 
different data in the same file, because once the first user opens the file it is locked to 
other users. Other users can read the file, but may not edit data. 
 The problems caused by concurrency are even more important than the ability to support 
concurrent transactions. For example, when one user is changing data but has not yet 
saved (committed) that data, then the database should not allow other users who query 
the same data to view the changed, unsaved data. Instead the user should only view the 
original data.
Concurrency Problems 
Concurrency problems include: 
• Lost or buried updates. 
• Uncommitted dependency (dirty read). 
• Inconsistent analysis (non-repeatable read). 
• Phantom reads.
Concurrency Problems 
• Lost Updates 
Lost updates occur when two or more transactions select the same row and then update the row based on 
the value originally selected. Each transaction is unaware of other transactions. The last update overwrites 
updates made by the other transactions, which results in lost data. 
For example,- 
Two editors make an electronic copy of the same document. Each editor changes the copy independently 
and then saves the changed copy, thereby overwriting the original document. The editor who saves the 
changed copy last overwrites changes made by the first editor. This problem could be avoided if the second 
editor could not make changes until the first editor had finished. 
• Uncommitted Dependency (Dirty Read) 
Uncommitted dependency occurs when a second transaction selects a row that is being updated by another 
transaction. The second transaction is reading data that has not been committed yet and may be changed by 
the transaction updating the row. 
• For example, 
An editor is making changes to an electronic document. During the changes, a second editor takes a copy of 
the document that includes all the changes made so far, and distributes the document to the intended 
audience. The first editor then decides the changes made so far are wrong and removes the edits and saves 
the document. The distributed document contains edits that no longer exist, and should be treated as if they 
never existed. This problem could be avoided if no one could read the changed document until the first 
editor determined that the changes were final.
Concurrency Problems 
• Inconsistent Analysis (Non-repeatable Read) 
Inconsistent analysis occurs when a second transaction accesses the same row several times and reads 
different data each time. Inconsistent analysis is similar to uncommitted dependency in that another 
transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, 
the data read by the second transaction was committed by the transaction that made the change. Also, 
inconsistent analysis involves multiple reads (two or more) of the same row and each time the 
information is changed by another transaction; thus, the term non-repeatable read. 
• For example, an editor reads the same document twice, but between each reading, the writer 
rewrites the document. When the editor reads the document for the second time, it has changed. 
The original read was not repeatable. This problem could be avoided if the editor could read the 
document only after the writer has finished writing it. 
• Phantom Reads 
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range 
of rows being read by a transaction. The transaction's first read of the range of rows shows a row that 
no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. 
Similarly, as the result of an insert by a different transaction, the transaction's second or succeeding 
read shows a row that did not exist in the original read. 
• For example, an editor makes changes to a document submitted by a writer, but when the changes 
are incorporated into the master copy of the document by the production department, they find that 
new unedited material has been added to the document by the author. This problem could be 
avoided if no one could add new material to the document until the editor and production 
department finish working with the original document.
Concurrency Control 
• Lock-Based Protocols 
• Timestamp-Based Protocols 
• Validation-Based Protocols 
• Multiple Granularity 
• Multiversion Schemes 
• Insert and Delete Operations
Lock-Based Protocols 
• A lock is a mechanism to control concurrent access to a data item 
• Data items can be locked in two modes : 
1. exclusive (X) mode. Data item can be both read as well as 
written. X-lock is requested using lock-X instruction. 
2. shared (S) mode. Data item can only be read. S-lock is 
requested using lock-S instruction. 
• Lock requests are made to concurrency-control manager. 
Transaction can proceed only after request is granted.
Lock-Based Protocols (Cont.) 
• Lock-compatibility matrix 
A transaction may be granted a lock on an item if the requested lock is compatible with 
locks already held on the item by other transactions 
Any number of transactions can hold shared locks on an item, but if any transaction 
holds an exclusive on the item no other transaction may hold any lock on the item. 
If a lock cannot be granted, the requesting transaction is made to wait till all 
incompatible locks held by other transactions have been released. The lock is then 
granted.
Lock-Based Protocols (Cont.) 
• Example of a transaction performing locking: 
T2: lock-S(A); 
read (A); 
unlock(A); 
lock-S(B); 
read (B); 
unlock(B); 
display(A+B) 
• Locking as above is not sufficient to guarantee serializability — if 
A and B get updated in-between the read of A and B, the 
displayed sum would be wrong. 
• A locking protocol is a set of rules followed by all transactions 
while requesting and releasing locks. Locking protocols restrict 
the set of possible schedules.
Pitfalls of Lock-Based Protocols 
• Consider the partial schedule 
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 
to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to 
release its lock on A. 
Such a situation is called a deadlock. 
To handle a deadlock one of T3 or T4 must be rolled back 
and its locks released.
Pitfalls of Lock-Based Protocols (Cont.) 
• The potential for deadlock exists in most locking protocols. 
Deadlocks are a necessary evil. 
• Starvation is also possible if concurrency control manager is 
badly designed. For example: 
• A transaction may be waiting for an X-lock on an item, while a 
sequence of other transactions request and are granted an S-lock 
on the same item. 
• The same transaction is repeatedly rolled back due to deadlocks. 
• Concurrency control manager can be designed to prevent 
starvation
The Two-Phase Locking Protocol 
• This is a protocol which ensures conflict-serializable schedules. 
• Phase 1: Growing Phase 
• transaction may obtain locks 
• transaction may not release locks 
• Phase 2: Shrinking Phase 
• transaction may release locks 
• transaction may not obtain locks 
The protocol assures serializability. It can be proved that the 
transactions can be serialized in the order of their lock points (i.e. 
the point where a transaction acquired its final lock).
The Two-Phase Locking Protocol (Cont.) 
• Two-phase locking does not ensure freedom from deadlocks 
• Cascading roll-back is possible under two-phase locking. To avoid this, 
follow a modified protocol called strict two-phase locking. Here a 
transaction must hold all its exclusive locks till it commits/aborts. 
• Rigorous two-phase locking is even stricter: here all locks are held till 
commit/abort. In this protocol transactions can be serialized in the 
order in which they commit 
• There can be conflict serializable schedules that cannot be obtained if 
two-phase locking is used. 
• However, in the absence of extra information (e.g., ordering of access 
to data), two-phase locking is needed for conflict serializability in the 
following sense: 
Given a transaction Ti that does not follow two-phase locking, we 
can find a transaction Tj that uses two-phase locking, and a schedule 
for Ti and Tj that is not conflict serializable.
Multiple Granularity 
• Granularity is the extent to which a system is broken down into small 
parts, either the system itself or its description or observation. It is the 
extent to which a larger entity is subdivided. For example, a yard 
broken into inches has finer granularity than a yard broken into feet. 
• Allow data items to be of various sizes and define a hierarchy of data 
granularities, where the small granularities are nested within larger 
ones 
• Can be represented graphically as a tree (but don't confuse with tree-locking 
protocol) 
• When a transaction locks a node in the tree explicitly, it implicitly locks 
all the node's descendents in the same mode.
Example of Granularity Hierarchy 
The levels, starting from the coarsest (top) level are 
database 
area 
file 
record
Intention Lock Modes 
• In addition to S and X lock modes, there are three additional lock 
modes with multiple granularity: 
• intention-shared (IS): indicates explicit locking at a lower level of the tree 
but only with shared locks. 
• intention-exclusive (IX): indicates explicit locking at a lower level with 
exclusive or shared locks 
• shared and intention-exclusive (SIX): the subtree rooted by that node is 
locked explicitly in shared mode and explicit locking is being done at a 
lower level with exclusive-mode locks. 
• intention locks allow a higher level node to be locked in S or X mode 
without having to check all descendent nodes.
Deadlock Handling 
• A deadlock is a situation in which two or more competing actions are each 
waiting for the other to finish, and thus neither ever does. 
• In an operating system, a deadlock is a situation which occurs when 
a process or threadenters a waiting state because a resource requested is 
being held by another waiting process, which in turn is waiting for another 
resource. If a process is unable to change its state indefinitely because the 
resources requested by it are being used by another waiting process, then 
the system is said to be in a deadlock 
System is deadlocked if there is a set of transactions such that every transaction in 
the set is waiting for another transaction in the set.
Deadlock Handling 
• Deadlock prevention protocols ensure that the system will never 
enter into a deadlock state. 
• Some prevention strategies : 
• Require that each transaction locks all its data items before it begins 
execution (pre declaration). 
• Impose partial ordering of all data items and require that a transaction 
can lock data items only in the order specified by the partial order 
(graph-based protocol). 
• Timeout-Based Schemes : 
• a transaction waits for a lock only for a specified amount of time. After 
that, the wait times out and the transaction is rolled back. 
• thus deadlocks are not possible 
• simple to implement; but starvation* is possible. Also difficult to 
determine good value of the timeout interval. 
* Starvation is a severe deficiency in caloric energy, nutrient, and vitamin intake. It is 
the most extreme form of malnutrition
Deadlock Detection 
• After a deadlock is detected, it can be corrected by using one of 
the following methods: 
• Process Termination: One or more process involved in the 
deadlock may be aborted. We can choose to abort all processes 
involved in the deadlock. This ensures that deadlock is resolved 
with certainty and speed. But the expense is high as partial 
computations will be lost. Or, we can choose to abort one 
process at a time until the deadlock is resolved. This approach 
has high overheads because after each abort an algorithm must 
determine whether the system is still in deadlock. Several 
factors must be considered while choosing a candidate for 
termination, such as priority and age of the process. 
• Resource Preemption: Resources allocated to various processes 
may be successively preempted and allocated to other 
processes until the deadlock is broken.
Deadlock Detection (Cont.) 
Wait-for graph without a cycle Wait-for graph with a cycle
Deadlock Recovery 
• When deadlock is detected : 
• Some transaction will have to rolled back (made a victim) to 
break deadlock. Select that transaction as victim that will incur 
minimum cost. 
• Rollback -- determine how far to roll back transaction 
• Total rollback: Abort the transaction and then restart it. 
• More effective to roll back transaction only as far as necessary to 
break deadlock. 
• Starvation happens if same transaction is always chosen as 
victim. Include the number of rollbacks in the cost factor to avoid 
starvation
Other Approaches to Concurrency Control 
• Timestamp-Based Protocols 
• Each transaction is issued a timestamp when it enters the 
system. If an old transaction Ti has time-stamp TS(Ti), a new 
transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) 
<TS(Tj). 
• The protocol manages concurrent execution such that the 
time-stamps determine the serializability order. 
• In order to assure such behavior, the protocol maintains for 
each data Q two timestamp values: 
• W-timestamp(Q) is the largest time-stamp of any transaction that 
executed write(Q) successfully. 
• R-timestamp(Q) is the largest time-stamp of any transaction that 
executed read(Q) successfully.
Timestamp-Based Protocols 
• The timestamp ordering protocol ensures that any conflicting read and write 
operations are executed in timestamp order. 
• Suppose a transaction Ti issues a read(Q) 
1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q that was 
already overwritten. 
 Hence, the read operation is rejected, and Ti is rolled back. 
2. If TS(Ti)W-timestamp(Q), then the read operation is executed, and R-timestamp( 
Q) is set to max(R-timestamp(Q), TS(Ti)). 
• Suppose that transaction Ti issues write(Q). 
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed 
previously, and the system assumed that that value would never be produced. 
 Hence, the write operation is rejected, and Ti is rolled back. 
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. 
 Hence, this write operation is rejected, and Ti is rolled back. 
3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
Correctness of Timestamp-Ordering Protocol 
• The timestamp-ordering protocol guarantees serializability since all 
the arcs in the precedence graph are of the form: 
Thus, there will be no cycles in the precedence graph 
• Timestamp protocol ensures freedom from deadlock as no 
transaction ever waits. 
• But the schedule may not be cascade-free, and may not even be 
recoverable.
DCL commands are used to enforce database security in a multiple user database 
environment. Two types of DCL commands are GRANT and REVOKE. Only Database 
Administrator's or owner's of the database object can provide/remove privileges on a 
databaseobject 
• SQL GRANT Command 
SQL GRANT is a command used to provide access or privileges on the database objects 
to the users 
The Syntax for the GRANT command is:
• SQL GRANT Command 
• privilege_name is the access right or privilege granted to the user. Some of 
the access rights are ALL, EXECUTE, and SELECT. 
• object_name is the name of an database object like TABLE, VIEW, STORED 
PROC and SEQUENCE. 
• user_name is the name of the user to whom an access right is being granted. 
• user_name is the name of the user to whom an access right is being granted. 
• PUBLIC is used to grant access rights to all users. 
• ROLES are a set of privileges grouped together. 
• WITH GRANT OPTION - allows a user to grant access rights to other users.
SQL GRANT Command 
• For Example: 
GRANT SELECT ON employee TO user1;This command grants a SELECT 
permission on employee table to user1.You should use the WITH GRANT 
option carefully because for example if you GRANT SELECT privilege on 
employee table to user1 using the WITH GRANT option, then user1 can GRANT 
SELECT privilege on employee table to another user, such as user2 etc. Later, if 
you REVOKE the SELECT privilege on employee from user1, still user2 will have 
SELECT privilege on employee table.
SQL REVOKE Command : 
• The REVOKE command removes user access rights or privileges to the database objects. 
• The Syntax for the REVOKE command is: 
For Example: 
REVOKE SELECT ON employee FROM user1;This command will REVOKE a SELECT privilege on 
employee table from user1.When you REVOKE SELECT privilege on a table from a user, the 
user will not be able to SELECT data from that table anymore. However, if the user has 
received SELECT privileges on that table from more than one users, he/she can SELECT from 
that table until everyone who granted the permission revokes it. You cannot REVOKE 
privileges if they were not initially granted by you.
Privileges and Roles: 
Privileges: Privileges defines the access rights provided to a user on a database 
object. There are two types of privileges. 
1) System privileges - This allows the user to CREATE, ALTER, or DROP database 
objects. 
2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or 
DELETE data from database objects to which the privileges apply. 
Few CREATE system privileges are listed below: 
System Privileges Description 
CREATE object 
allows users to create the specified object in their 
own schema. 
CREATE ANY 
object 
allows users to create the specified object in any 
schema.
The above rules also apply for ALTER and DROP system privileges. 
Few of the object privileges are listed below: 
Object Privileges Description 
INSERT allows users to insert rows into a table. 
SELECT allows users to select data from a database object. 
UPDATE allows user to update data in a table. 
EXECUTE allows user to execute a stored procedure or a function.
Roles: 
Roles are a collection of privileges or access rights. When there are many users in 
a database it becomes difficult to grant or revoke privileges to users. Therefore, if 
you define roles, you can grant or revoke privileges to users, thereby automatically 
granting or revoking privileges. You can either create Roles or use the system roles 
pre-defined by oracle. 
Some of the privileges granted to the system roles are as given below: 
System Role Privileges Granted to the Role 
CONNECT 
CREATE TABLE, CREATE VIEW, CREATE SYNONYM, CREATE SEQUENCE, 
CREATE SESSION etc. 
RESOURCE 
CREATE PROCEDURE, CREATE SEQUENCE, CREATE TABLE, CREATE 
TRIGGER etc. The primary usage of the RESOURCE role is to restrict 
access to database objects. 
DBA ALL SYSTEM PRIVILEGES
• Creating Roles: 
The Syntax to create a role is: 
It's easier to GRANT or REVOKE privileges to the users through a role rather 
than assigning a privilege directly to every user. If a role is identified by a 
password, then, when you GRANT or REVOKE privileges to the role, you 
definitely have to identify it with the password.
Database Recovery 
There are many situations in which a transaction may not reach a commit or 
abort point. An operating system crash can terminate the DBMS processes 
• The DBMS can crash 
• The process issuing the transaction can abnormally terminate 
• The system might lose power 
• A disk may fail or other hardware may fail. 
• Human error can result in deletion of critical data. 
• In any of these situations, data in the database may become inconsistent or 
lost.
Two general classes of failures: 
• Process or Processor failures in which no data is lost but the 
database must be restarted and recovered. 
• Storage failure wherein data on disk is lost or corrupted. 
In such cases, data must be restored from backup and then the database must be restarted and 
recovered. 
Database Recovery is the process of restoring the database and the data to a 
consistent state. This may include restoring lost data up to the point of the event (e.g. 
system crash). 
There are many different approaches to recover a database. Here we discuss Manual 
Reprocessing, and a variety of Automated recovery techniques.
Manual Reprocessing 
• In a Manual Reprocessing approach, the database is periodically backed 
up (a database save) and all transactions applied since the last save are 
recorded 
• If the system crashes, the latest database save is restored and all of the 
transactions are re-applied (by users) to bring the database back up to 
the point just before the crash. 
• Several shortcomings: 
• Time required to re-apply transactions 
• Transactions might have other (physical) consequences 
• Re-applying concurrent transactions is not straight forward.
Automated Recovery 
There are several types of Automated Recovery techniques including: Deferred 
update and immediate update, shadow paging, etc. 
Recall, a transaction can be in one of the following states: 
• Active - when the transaction just begins 
• Partially Committed - after the last operation has completed (but before the 
commit point is reached) 
• Failed - Normal operation is prevented (e.g., in the event of a system crash) 
• Aborted - Transaction is rolled back. That is, all of its effects are undone 
• Committed - Transaction completes all operations and moved the database to 
the next consistent state
Deferred Update Recovery 
• Also called NO-UNDO/REDO During a transaction, only record the changes to data items in the log. When the 
transaction commits, actually update the data items on disk. Two main rules: A transaction cannot change 
any items in the database until it commits. 
• A transaction may not commit until all of the write operations are successfully recorded in the log. This means 
that we must check to see that the log is actually written to disk. 
Since T1 and T4 committed, their changes were written to disk 
However, T2 and T3 did not commit, hence their changes were not written to disk 
To recover, we simply ignore those transactions that did not commit. 
Advantages: Recovery is made easier: 
Any transaction that reached the commit point (from the log) has its writes applied to the database (REDO). 
All other transactions are ignored. 
Cascading rollback does not occur because no other transaction sees the work of another until it is committed 
(no stale reads).
Immediate Update Recovery 
• Immediate Update applies the write operations to the database as the 
transaction is executing (as opposed to at the end as with deferred 
update). 
Writes are still saved in the log before being applied to the database - a 
write-ahead log (WAL) 
• Maintain two logs: 
• REDO log: A record of each new data item in the database. 
• UNDO log: A record of each updated data item (old values). 
• Two rules: 
• Transaction T may not update the database until all UNDO entries have 
been written to the UNDO log. 
• Transaction T is not allowed to commit until all REDO and UNDO log entries 
are written (forced-written to disk).
• To Recover: 
• Begin at the end of the log and read backwards to the last chechpoint. 
Create two lists: 
C - transactions that have committed 
NC - transactions that did not commit 
• For each transaction T in C, make a third list of transactions, RB, that have 
read items that were updated by NC transactions during T. 
• UNDO: Undo the effects of the transactions in NC and RB. 
• REDO: Apply the effects of all of the transactions that did commit (all T in 
list C) 
Advantages: 
• Immediate update allows higher concurrency because transactions write 
continuously to the database rather than waiting until the commit point. 
Disadvantages: 
• Step 2 above can lead to cascading rollbacks - time consuming and may be 
problematic.
OLAP (online analytical processing) 
• OLAP (online analytical processing) is computer processing that enables a user to easily and 
selectively extract and view data from different points of view. 
For example, a user can request that data be analyzed to display a spreadsheet showing all of 
a company's beach ball products sold in Florida in the month of July, compare revenue 
figures with those for the same products in September, and then see a comparison of other 
product sales in Florida in the same time period. 
To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas 
a relational database can be thought of as two-dimensional, a multidimensional database 
considers each data attribute (such as product, geographic sales region, and time period) as a 
separate "dimension." OLAP software can locate the intersection of dimensions (all products 
sold in the Eastern region above a certain price during a certain time period) and display them. 
Attributes such as time periods can be broken down into sub-attributes.
OLAP (online analytical processing) 
• OLAP can be used for data mining or the discovery of previously undiscerned 
relationships between data items. An OLAP database does not need to be as 
large as a data warehouse, since not all transactional data is needed for trend 
analysis. Using Open Database Connectivity (ODBC), data can be imported from 
existing relational databases to create a multidimensional database for OLAP. 
• Online Analytical Processing, a category of software tools that provides analysis 
of data stored in a database. OLAP tools enable users to analyze different 
dimensions of multidimensional data. For example, it provides time series and 
trend analysis views. OLAP often is used in data mining. 
• The chief component of OLAP is the OLAP server, which sits between a client and 
a database management systems (DBMS). The OLAP server understands how 
data is organized in the database and has special functions for analyzing the data. 
There are OLAP servers available for nearly all the major database systems.
OLTP vs. OLAP 
• OLTP (On-line Transaction Processing) is characterized by a large number of short on-line 
transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very 
fast query processing, maintaining data integrity in multi-access environments and an 
effectiveness measured by number of transactions per second. In OLTP database there is 
detailed and current data, and schema used to store transactional databases is the entity 
model (usually 3NF). 
• OLAP (On-line Analytical Processing) is characterized by relatively low volume of 
transactions. Queries are often very complex and involve aggregations. For OLAP systems a 
response time is an effectiveness measure. OLAP applications are widely used by Data 
Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional 
schemas (usually star schema).
The following table summarized main differences between OLPT and OLAP 
OLTP System 
Online Transaction Processing 
(Operational System) 
OLAP System 
Online Analytical Processing 
(Data Warehouse) 
Source of data 
Operational data; OLTPs are the original source of the 
data. 
Consolidation data; OLAP data comes from the various OLTP 
Databases 
Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support 
What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities 
Inserts and Updates 
Short and fast inserts and updates initiated by end 
users 
Periodic long-running batch jobs refresh the data 
Queries 
Relatively standardized and simple queries Returning 
relatively few records 
Often complex queries involving aggregations 
Processing Speed Typically very fast 
Depends on the amount of data involved; batch data refreshes 
and complex queries may take many hours; query speed can be 
improved by creating indexes 
Space Requirements Can be relatively small if historical data is archived 
Larger due to the existence of aggregation structures and history 
data; requires more indexes than OLTP 
Database Design Highly normalized with many tables 
Typically de-normalized with fewer tables; use of star and/or 
snowflake schemas 
Backup and Recovery 
Backup religiously; operational data is critical to run the 
business, data loss is likely to entail significant 
monetary loss and legal liability 
Instead of regular backups, some environments may consider 
simply reloading the OLTP data as a recovery method
What Is a Data Warehouse? 
• A data warehouse is a relational or multidimensional database 
that is designed for query and analysis. Data warehouses are not 
optimized for transaction processing, which is the domain of OLTP 
systems. Data warehouses usually consolidate historical and 
analytic data derived from multiple sources. Data warehouses 
separate analysis workload from transaction workload and enable 
an organization to consolidate data from several sources. 
• A data warehouse usually stores many months or years of data to 
support historical analysis. The data in a data warehouse is 
typically loaded through an extraction, transformation, and 
loading (ETL) process from one or more data sources such as OLTP 
applications, mainframe applications, or external data providers. 
• Users of the data warehouse perform data analyses that are often 
time-related. Examples include consolidation of last year's sales 
figures, inventory analysis, and profit by product and by customer
The Key Characteristics of a Data Warehouse 
• The key characteristics of a data warehouse are as follows: 
• Some data is denormalized for simplification and to improve 
performance 
• Large amounts of historical data are used 
• Queries often retrieve large amounts of data 
• Both planned and ad hoc queries are common 
• The data load is controlled 
• In general, fast query performance with high data throughput is the key 
to a successful data warehouse.
Common Oracle Data Warehousing Tasks 
• As an Oracle data warehousing administrator or designer, you can expect to 
be involved in the following tasks: 
• Configuring an Oracle database for use as a data warehouse 
• Designing data warehouses 
• Performing upgrades of the database and data warehousing software to new 
releases 
• Managing schema objects, such as tables, indexes, and materialized views 
• Managing users and security 
• Developing routines used for the extraction, transformation, and loading 
(ETL) processes 
• Creating reports based on the data in the data warehouse 
• Backing up the data warehouse and performing recovery when necessary 
• Monitoring the data warehouse's performance and taking preventive or 
corrective action as required 
• In a small-to-midsize data warehouse environment, you might be the sole 
person performing these tasks. In large, enterprise environments, the job is 
often divided among several DBAs and designers, each with their own 
specialty, such as database security or database tuning.
Characteristics of Data Warehouses 
• Data warehouses have several distinguishing characteristics. 
• These systems combine data from multiple sources. Operational systems 
such as ERP systems provide production data, financial systems supply 
revenue and expense data, and human resource systems present 
employee data. 
• The data copied into a data warehouse does not change (except to 
correct errors). The data warehouse is a historical record of the state of an 
organization. The frequent changes of the source OLTP systems are 
reflected in the data warehouse by adding new data, not by changing 
existing data. 
• Data warehouses are subject oriented, that is, they focus on measuring 
entities, such as sales, inventory, and quality. OLTP systems, by contrast, 
are function oriented and focus on operations such as order fulfillment. 
• In data warehouses, data from distinct function-oriented systems is 
integrated to provide a single view of an operational entity. 
• Data warehouses are designed for business users, not database 
programmers, so they are easy to understand and query. 
• Data warehouses are designed for speed. Users interact with data 
warehouses—running queries, analyzing data, formulating new queries 
based on information gleaned from earlier queries—so the systems must 
be responsive.
Metadata 
• Metadata is "data about data". The term is ambiguous, as it is used for two 
fundamentally different concepts (types). Structural metadata is about the 
design and specification of data structures and is more properly called "data 
about the containers of data"; descriptive metadata, on the other hand, is about 
individual instances of application data, the data content. 
Metadata (metacontent) are defined as the data providing information about one 
or more aspects of the data, such as: 
• Means of creation of the data 
• Purpose of the data 
• Time and date of creation 
• Creator or author of the data 
• Location on a computer network where the data were created 
• Standards used
Metadata types 
The metadata application is many fold covering a large variety of fields of application there are nothing 
but specialized and well accepted models to specify types of metadata 
 Structural metadata are used to describe the structure of database objects such as tables, columns, 
keys and indexes. 
 Guide metadata are used to help humans find specific items and are usually expressed as a set of 
keywords in a natural language 
 Technical metadata correspond to internal metadata, business metadata - to external metadata. 
Kimball adds a third category named 
 Process metadata. On the other hand, NISO distinguishes among three types of metadata: 
descriptive, structural and administrative.[ 
 Descriptive metadata are the information used to search and locate an object such as title, author, 
subjects, keywords, publisher; 
 Structural metadata give a description of how the components of the object are organised 
 Administrative metadata refer to the technical information including file type. Two sub-types of 
administrative metadata are rights management metadata and preservation metadata.
OLAP Metadata Model 
• The basic data model in a relational database is a table composed of one 
or more columns of data. All of the data is stored in columns. In contrast, 
the basic data model for multidimensional analysis is a cube, which is 
composed of Measures, Dimensions, and Attributes. 
• The OLAP Catalog metadata informs applications about the data that is 
available within the database. The application can then define 
multidimensional objects to represent that data. When the application 
runs, it instantiates these objects and populates them with data.
Mapping Data Objects to Metadata Objects 
The objects comprising a data warehouse and Oracle OLAP metadata use different data 
structures. The data objects in your data warehouse are represented to the OLAP 
metadata catalog in the following relational objects, regardless of whether the data is 
actually stored in relational tables or workspace variables: 
• Fact Tables or Views 
• Level-based dimension Tables or Views 
Oracle OLAP metadata catalog maps the data warehouse schema to these 
multidimensional data objects: 
• Measures 
• Dimensions 
• Dimension attributes 
• Levels 
• Level attributes 
• Hierarchies 
• Cubes 
• Measure folders
Measures 
 Measures are the same as facts. The term "fact" is typically used in 
relational databases, and the term "measure" is typically used in 
multidimensional applications. 
 Measures are thus located in fact tables. A fact table has columns that 
store measures (or facts) and foreign key columns that create the 
association with dimension tables. 
 Measures contain the data that you wish to analyze, such as Sales or 
Cost. OLAP Catalog metadata requires that a column have a numerical or 
a date data type to be identified as a measure. Most frequently, a 
measure is numerical and additive.
Dimensions 
• Dimensions identify and categorize your data. Dimension members are stored in a 
dimension table. Each column represents a particular level in a hierarchy. In a star 
schema, the columns are all in the same table; in a snowflake schema, the columns 
are in separate tables for each level. 
• Because measures are typically multidimensional, a single value in a measure must 
be qualified by a member of each dimension to be meaningful. 
For example theunit_cost measure has two dimensions: 
products_dim and times_dim. 
A value of unit_cost (21.60) is only meaningful when it is qualified by a specific 
product code (1575) and a time period (28-jan-1998).
Attributes 
Attributes provide descriptive information about the data and are typically 
used for display. 
Level Attributes 
Level attributes provide supplementary information about the dimension 
members at a particular level of a dimension hierarchy. The dimension 
members themselves may be meaningless, such as a value of "1296" for a 
time period. These cryptic values for dimension members are used internally 
for selecting and sorting quickly, but are meaningless to users. 
For example- 
You might have columns for employee number (ENUM), last name 
(LAST_NAME), first name (FIRST_NAME), and telephone extension 
(TELNO). ENUM is the best choice for a level column, since it is a key column 
and its values uniquely identify the employees. ENUM also has 
a NUMBER data type, which makes it more efficient than a text column for the 
creation of indexes. LAST_NAME, FIRST_NAME, and TELNO are attributes. 
Even though they are dimensioned by ENUM, they do not make suitable 
measures because they are descriptive text rather than business 
measurements
Dimension Attributes 
• Dimension attributes specify groupings of level attributes for a 
specific dimension. Whereas level attributes map to specific data 
values, dimension attributes are purely logical metadata objects. 
• An example of a dimension attribute is end date, which is 
required for time dimensions. If a time dimension has month, 
quarter, and year levels, end date identifies the last date of each 
month, each quarter, and each year. Within a relational schema, 
the three level attributes that make up the end date dimension 
attribute would be stored in columns with names 
like month_end_date, quarter_end_date, and year_end_date.
Measure Folders 
• Measures can be organized within measure folders, which 
facilitate the browsing of data by business area. Measure folders 
are also known as catalogs. 
• Whereas dimensions and measures are associated with the 
schemas that contain their source data, measure folders are 
schema independent. Each OLAP client can view all measure 
folders defined within the Oracle instance.
ETL -Extract, Transform, Load 
• ETL is short for extract, transform, load 
Three database functions that are combined into one tool to pull 
data out of one database and place it into another database. 
• Extract is the process of reading data from a database. 
• Transform is the process of converting the extracted data from 
its previous form into the form it needs to be in so that it can be 
placed into another database. Transformation occurs by using 
rules or lookup tables or by combining the data with other data. 
• Load is the process of writing the data into the target database. 
• ETL is used to migrate data from one database to another, to 
form data marts and data warehouses and also to convert 
databases from one format or type to another.
ETL - Extract, Transform, Load 
• In managing databases, extract, transform, load (ETL) refers to 
three separate functions combined into a single programming 
tool. First, the extract function reads data from a specified 
source database and extracts a desired subset of data. Next, the 
transform function works with the acquired data - using rules or 
lookup tables, or creating combinations with other data - to 
convert it to the desired state. Finally, the load function is used to 
write the resulting data (either all of the subset or just the 
changes) to a target database, which may or may not previously 
exist. 
• ETL can be used to acquire a temporary subset of data for reports 
or other purposes, or a more permanent data set may be acquired 
for other purposes such as: the population of a data mart or data 
warehouse; conversion from one database type to another; and 
the migration of data from one database or platform to another
Dimensional modeling 
• Dimensional modeling (DM) is the name of a set of techniques and concepts used 
in data warehouse design. It is considered to be different from entity-relationship 
modeling (ER). Dimensional Modeling does not necessarily involve a relational database. 
The same modeling approach, at the logical level, can be used for any physical form, 
such as multidimensional database or even flat files. 
• DM is a design technique for databases intended to support end-user queries in a data 
warehouse. It is oriented around understandability and performance 
• ER is very useful for the transaction capture, it should be avoided for end-user delivery. 
• Dimensional modeling always uses the concepts of facts (measures), and dimensions 
(context). Facts are typically (but not always) numeric values that can be aggregated, and 
dimensions are groups of hierarchies and descriptors that define the facts. For example, 
sales amount is a fact; timestamp, product, register#, store#, etc. are elements of 
dimensions. 
• Dimensional models are built by business process area, e.g. store sales, inventory, 
claims, etc. Because the different business process areas share some but not all 
dimensions, efficiency in design, operation, and consistency, is achieved 
using conformed dimensions, i.e. using one copy of the shared dimension across subject 
areas
Dimensional modeling process 
• The dimensional model is built on a star-like schema, with dimensions 
surrounding the fact table. 
To build the schema, the following design model is used: 
• Choose the business process 
• Declare the grain 
• Identify the dimensions 
• Identify the fact 
Choose the business process – 
The process of dimensional modeling builds on a 4-step design method that 
helps to ensure the usability of the dimensional model and the use of the data 
warehouse. The basics in the design build on the actual business process which 
the data warehouse should cover. Therefore the first step in the model is to 
describe the business process which the model builds on. This could for instance 
be a sales situation in a retail store. To describe the business process, one can 
choose to do this in plain text or use basic Business Process Modeling Notation 
(BPMN) or other design guides like the Unified Modeling Language (UML).
Dimensional modeling process 
• Declare the grain 
After describing the Business Process, the next step in the design is to declare the 
grain of the model. The grain of the model is the exact description of what the 
dimensional model should be focusing on. This could for instance be “An individual 
line item on a customer slip from a retail store”. To clarify what the grain means, 
you should pick the central process and describe it with one sentence. 
Furthermore the grain (sentence) is what you are going to build your dimensions 
and fact table from. You might find it necessary to go back to this step to alter the 
grain due to new information gained on what your model is supposed to be able 
to deliver. 
• Identify the dimensions 
The third step in the design process is to define the dimensions of the model. The 
dimensions must be defined within the grain from the second step of the 4-step 
process. Dimensions are the foundation of the fact table, and is where the data for 
the fact table is collected. Typically dimensions are nouns like date, store, 
inventory etc. These dimensions are where all the data is stored. For example, the 
date dimension could contain data such as year, month and weekday.
Dimensional modeling process 
• Identify the facts 
After defining the dimensions, the next step in the process is to make keys for 
the fact table. This step is to identify the numeric facts that will populate each 
fact table row. This step is closely related to the business users of the system, 
since this is where they get access to data stored in the data warehouse. 
Therefore most of the fact table rows are numerical, additive figures such as 
quantity or cost per unit, etc.
Dimensional modeling process 
Benefits of the dimensional modeling are following: 
• Understandability - Compared to the normalized model, the dimensional model is easier to 
understand and more intuitive. In dimensional models, information is grouped into coherent 
business categories or dimensions, making it easier to read and interpret. Simplicity also allows 
software to navigate databases efficiently. In normalized models, data is divided into many discrete 
entities and even a simple business process might result in dozens of tables joined together in a 
complex way. 
• Query performance - Dimensional models are more denormalized and optimized for data 
querying, while normalized models seek to eliminate data redundancies and are optimized for 
transaction loading and updating. The predictable framework of a dimensional model allows the 
database to make strong assumptions about the data that aid in performance. Each dimension is an 
equivalent entry point into the fact table, and this symmetrical structure allows effective handling 
of complex queries. Query optimization for star join databases is simple, predictable, and 
controllable. 
• Extensibility - Dimensional models are extensible and easily accommodate unexpected new 
data. Existing tables can be changed in place either by simply adding new data rows into the table 
or executing SQL alter table commands. No queries or other applications that sit on top of the 
Warehouse need to be reprogrammed to accommodate changes. Old queries and applications 
continue to run without yielding different results. But in normalized models each modification 
should be considered carefully, because of the complex dependencies between database tables.
Fact Tables, Dimension Tables 
• In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is 
located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where 
multiple fact tables are used, these are arranged as a fact constellation schema. 
• A fact table typically has two types of columns: those that contain facts and those that are foreign keys to 
dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its 
foreign keys. Fact tables contain the content of the data warehouse and store different types of measures 
like additive, non additive, and semi additive measures. 
• Fact tables provide the (usually) additive values that act as independent variables by which dimensional 
attributes are analyzed. 
• Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by 
which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by 
Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. 
Other dimensions might be members of this fact table (such as location/region) but these add nothing to 
the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the 
independent facts but generally provide insights at a higher level of aggregation (a region contains many 
stores).
Types of fact tables 
There are basically three fundamental measurement events, which characterizes all fact tables. 
• Transactional 
A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is 
usually specified as "one row per line in a transaction", e.g., every line on a receipt. Typically a transactional 
fact table holds data of the most detailed level, causing it to have a great number of dimensions associated 
with it. 
• Periodic snapshots 
The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any 
defined period of time, e.g. a performance summary of a salesman over the previous month. A periodic 
snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional 
fact table in order to deliver the chosen performance output. 
• Accumulating snapshots 
This type of fact table is used to show the activity of a process that has a well-defined beginning and end, e.g., 
the processing of an order. An order moves through specific steps until it is fully processed. As steps towards 
fulfilling the order are completed, the associated row in the fact table is updated. An accumulating snapshot 
table often has multiple date columns, each representing a milestone in the process. Therefore, it's important 
to have an entry in the associated date dimension that represents an unknown date, as many of the milestone 
dates are unknown at the time of the creation of the row.
Dimension table 
In data warehousing, a dimension table is one of the set of companion 
tables to a fact table. 
The fact table contains business facts (or measures), and foreign keys which 
refer to candidate keys (normally primary keys) in the dimension tables. 
Contrary to fact tables, dimension tables contain descriptive attributes (or 
fields) that are typically textual fields (or discrete numbers that behave like 
text). These attributes are designed to serve two critical purposes: query 
constraining and/or filtering, and query result set labeling. 
Dimension attributes should be: 
• Verbose (labels consisting of full words) 
• Descriptive 
• Complete (having no missing values) 
• Discretely valued (having only one value per dimension table row) 
• Quality assured (having no misspellings or impossible values)
• Dimension table rows are uniquely identified by a single key field. It is 
recommended that the key field be a simple integer because a key 
value is meaningless, used only for joining fields between the fact and 
dimension tables. 
The use of surrogate dimension keys brings several advantages, 
including 
• Performance. Join processing is made much more efficient by using a 
single field (the surrogate key) 
• Buffering from operational key management practices. This prevents 
situations where removed data rows might reappear when their 
natural keys get reused or reassigned after a long period of dormancy 
• Mapping to integrate disparate sources 
• Handling unknown or not-applicable connections 
• Tracking changes in dimension attribute values
The goal of a dimension table is to create standardized, conformed 
dimensions that can be shared across the enterprise's data 
warehouse environment, and enable joining to multiple fact tables 
representing various business processes. 
Conformed dimensions are important to the enterprise nature of DW/BI systems because 
they promote: 
• Consistency- Every fact table is filtered consistently, so that query 
answers are labeled consistently. 
• Integration-Queries can drill into different process fact tables 
separately for each individual fact table, then join the results on 
common dimension attributes. 
• Reduced development time to market. The common dimensions 
are available without recreating them
Introduction to Cubes 
• Cubes are the main objects in online analytic processing (OLAP), a technology that 
provides fast access to data in a data warehouse. A cube is a set of data that is 
usually constructed from a subset of a data warehouse and is organized and 
summarized into a multidimensional structure defined by a set 
of dimensions and measures. 
• A cube provides an easy-to-use mechanism for querying data with quick and 
uniform response times. End users use client applications to connect to an Analysis 
server and query the cubes on the server. In most client applications, end users 
issue a query on a cube by manipulating the user interface controls, which 
determine the contents of the query. This spares end users from writing language-based 
queries . Precalculated summary data called aggregations provides the 
mechanism for rapid and uniform response times to queries. Aggregations are 
created for a cube before end users access it. The results of a query are retrieved 
from the aggregations, the cube's source data in the data warehouse, a copy of 
this data on the Analysis server, the client cache, or a combination of these 
sources. An Analysis server can support many different cubes, such as a cube for 
sales, a cube for inventory, a cube for customers, and so on.
Introduction to Cubes 
• Every cube has a schema, which is the set of joined tables in the data warehouse 
from which the cube draws its source data. The central table in the schema is the 
fact table, the source of the cube's measures. The other tables are dimension 
tables, the sources of the cube's dimensions. For more information about 
schemas, see Cube Structure. 
• A cube is defined by the measures and dimensions that it contains. For example, a 
cube for sales analysis includes the measures Item_Sale_Price and Item_Cost and 
the dimensions Store_Location, Product_Line, and Fiscal_Year. This cube enables 
end users to separate Item_Sale_Price and Item_Cost into various categories by 
Store_Location, Product_Line, and Fiscal_Year. 
• Each cube dimension can contain a hierarchy of levels to specify the categorical 
breakdown available to end users. For example, the Store_Location dimension 
includes the level hierarchy: Continent, Country, State_Province, City, 
Store_Number. Each level in a dimension is of finer granularity than its parent. For 
example, continents contain countries or regions, and states or provinces contain 
cities. Similarly, the hierarchy of the Fiscal_Year dimension includes the levels Year, 
Quarter, Month, and Day.
OLAP cube 
• An OLAP cube is a multidimensional database that is optimized for data 
warehouse and online analytical processing (OLAP) applications. 
• An OLAP cube is a method of storing data in a multidimensional form, 
generally for reporting purposes. In OLAP cubes, data (measures) are 
categorized by dimensions. OLAP cubes are often pre-summarized across 
dimensions to drastically improve query time over relational databases. 
• Although it stores data like a traditional database does, an OLAP cube is 
structured very differently. Databases, historically, are designed according 
to the requirements of the IT systems that use them. OLAP cubes, however, 
are used by business users for advanced analytics. Thus, OLAP cubes are 
designed using business logic and understanding. They are optimized for 
analytical purposes, so that they can report on millions of records at a time. 
Business users can query OLAP cubes using plain English.
Dimension Hierarchies 
• A hierarchy defines relationships among a set of attributes that are grouped 
by levels in the dimension of a cube model. These relationships between 
levels are usually defined with a functional dependency to improve 
optimization. Multiple hierarchies can be defined for a dimension of a cube 
model. 
• Hierarchies document the relationships between levels in a dimension. When 
you define a hierarchy, DB2 Data Warehouse Edition creates a functional 
dependency by default between consecutive levels, such that the level key 
attributes from one level are functionally dependent on the level key 
attributes in the lower level. For example, in a Region hierarchy that is 
defined with the following levels: Region, State, and City, DB2 Data 
Warehouse Edition creates two functional dependencies. One functional 
dependency to show that City determines State, and a second functional 
dependency to show that state determines region
All level for the hierarchies in the following types of dimensions: 
Time dimensions- 
Aggregating your fact data across all of time is rarely useful, unless 
you know that the fact table contains data for a specific period of 
time, such as the year 2005. 
Scenario dimensions. 
A scenario dimension describes likely ways of analyzing your data. 
You might have two members, such as Actual and Budget, in your 
scenario dimension. In this case, it does not make sense to combine 
the data values for budget and actual into a combined aggregation. 
The hierarchy type describes the relationship among the levels 
within the hierarchy. The following four hierarchy types are 
supported:
Balanced 
• A hierarchy with meaningful levels and branches that have a consistent depth. 
Each level's logical parent is in the level directly above it. A balanced hierarchy can 
represent time where the meaning and depth of each level, such as Year, Quarter, 
and Month, is consistent. They are consistent because each level represents the 
same type of information, and each level is logically equivalent. Figure 1 shows an 
example of a balanced time hierarchy. 
• Figure 1. Balanced hierarchy. Example of a balanced hierarchy
Unbalanced 
• A hierarchy with levels that have a consistent parent-child relationship but have a logically 
inconsistent levels. The hierarchy branches also can have inconsistent depths. An unbalanced 
hierarchy can represent an organization chart. For example, Figure 2 shows a chief executive 
officer (CEO) on the top level of the hierarchy and at least two of the people that might 
branch off below including the chief operating officer and the executive secretary. The chief 
operating officer has more people branching off also, but the executive secretary does not. 
The parent-child relationships on both branches of the hierarchy are consistent. However, 
the levels of both branches are not logical equivalents. 
For example, an executive secretary is not the logical equivalent of a chief operating 
officer. 
Figure 2. Unbalanced hierarchy. Example of an unbalanced hierarchy
Ragged 
A hierarchy in which each level has a consistent meaning, but the branches have inconsistent 
depths because at least one member attribute in a branch level is unpopulated. A ragged 
hierarchy can represent a geographic hierarchy in which the meaning of each level such as city or 
country is used consistently, but the depth of the hierarchy varies. Figure 3 shows a geographic 
hierarchy that has Continent, Country, Province/State, and City levels defined. One branch has 
North America as the Continent, United States as the Country, California as the Province or State, 
and San Francisco as the City. However, the hierarchy becomes ragged when one member does 
not have an entry at all of the levels. For example, another branch has Europe as the Continent, 
Greece as the Country, and Athens as the City, but has no entry for the Province or State level 
because this level is not applicable to Greece for the business model in this example. In this 
example, the Greece and United States branches descend 
Figure 3. Ragged hierarchy. Example of a ragged hierarchy to different depths, creating a ragged 
hierarchy.
Network 
A hierarchy in which the order of levels is not specified, but in which levels do have semantic 
meaning. For example,Figure 4 shows a network hierarchy that describes product attributes 
such as Color, Size, and PackageType. Because the levels do not have an inherent parent-child 
relationship, the order of the levels is not important. A widget company might have member 
entries like white for Color, small for Size, and shrink wrap for PackageType. A second member 
entry might be red for Color, large for Size, and box for PackageType. 
Figure 4. Network hierarchy. Example of a network hierarchy
Advantages are there from developing a Data Warehouse? The three major advantages 
are: 
1. Integrating data from multiple sources; 
2. Performing new types of analyses; and 
3. Reducing cost to access historical data. 
Other benefits may include: 
1. Standardizing data across the organization, a "single version 
of the truth"; 
2. Improving turnaround time for analysis and reporting; 
3. Sharing data and allowing others to easily access data; 
4. Supporting ad hoc reporting and inquiry; 
5. Reducing the development burden on IS/IT; and 
6. Removing informational processing load from transaction-oriented 
databases;
Advantage of data marts. 
It is easy to access frequently needed data from the database when 
required by the client. 
We can give access to group of users to view the Datamart when it 
is required. Of course performance will be good. 
It is easy to maintain and to create the datamart. It will be related 
to specific business. 
And It is low cost to create a datamart rather than creating 
datarehouse with a huge space.
Difference between Data Warehouse and Data mart 
• A data mart is a simple form of a data warehouse that is focused on a single 
subject (or functional area), such as Sales or Finance or Marketing. Data 
marts are often built and controlled by a single department within an 
organization. Given their single-subject focus, data marts usually draw data 
from only a few sources. The sources could be internal operational systems, 
a central data warehouse, or external data. 
How Is It Different from a DataWarehouse? 
• A data warehouse, in contrast, deals with multiple subject areas and is 
typically implemented and controlled by a central organizational unit such 
as the Corporate Information Technology (IT) group. Often, it is called a 
central or enterprise data warehouse. Typically, a data warehouse 
assembles data from multiple source systems. 
• Nothing in these basic definitions limits the size of a data mart or the 
complexity of the decision-support data that it contains. Nevertheless, data 
marts are typically smaller and less complex than data warehouses; hence, 
they are typically easier to build and maintain. The following table 
summarizes the basic differences between a data warehouse and a data 
mart:
Difference between Data Warehouse and Data mart 
Parameters Data Warehouse Data Mart 
Scope Corporate Line-of-Business (LoB) 
Subjects Multiple Single Subject 
Data Sources Many Few 
Size (typical) 100 GB-TB+ < 100GB 
Implementation 
Time 
Months to years Months
DATA WAREHOUSE REPORTING TOOLS 
REPORTING TOOLS 
• connect to one or multiple databases in a single report 
• defining queries on the data warehouse database 
• loading millions of records for aggregation and analysis (big data) 
• creating simple and complex filters and sorts 
• create a pixel perfect report including a variety of formatting options 
• conditional formatting 
• automatic summarization of the measures: average, median, count, sum and so 
forth. 
• display pivot tables, simple lists or summarized tables 
• creating sections guided by a dimension 
• presenting data in many type of graphs
DATA WAREHOUSE REPORTING TOOLS 
ANALYSIS TOOLS 
• Clicking on data items to drill-down into the details 
• one click graphs 
• slicing and dicing – switching easily dimensions 
• nesting dimensions 
• sorting and filtering 
• conditional formatting 
• extreme performance getting answers instantly 
• automatic summarization of the measures
Query and Reporting Tools 
The data warehouse is accessed using an end-user query and reporting tool from Business 
Objects. Business Objects provides several tools to securely access the data warehouse or 
personal data files with a point-and-click interface including the following: 
• BusinessObjects -(Reporter and Explorer) ? a Microsoft Windows based query and 
reporting tool. 
• InfoView - a web based tool, that allows reports to be refreshed on demand (but 
can not create new reports). 
• InfoBurst - a web based server tool, that allows reports to be refreshed, scheduled 
and distributed. It can be used to distribute reports and data to users or servers in 
various formats (e.g. Text, Excel, PDF, HTML, etc) 
• Data Warehouse List Upload - a web based tool, that allows lists of data to be 
uploaded into the data warehouse for use as input to queries.

More Related Content

What's hot

Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
Acid Properties In Database Management System
Acid Properties In Database Management SystemAcid Properties In Database Management System
Acid Properties In Database Management SystemAshish Kumar
 
Business intelligence systems
Business intelligence systemsBusiness intelligence systems
Business intelligence systemsUMaine
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing systemJayson Jueco
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing systemVidhu Arora
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big DataRavinder Kamboj
 
Information System within the Organization
Information System within the OrganizationInformation System within the Organization
Information System within the Organizationlearnt
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousingDhilsath Fathima
 
Introduction to transaction management
Introduction to transaction managementIntroduction to transaction management
Introduction to transaction managementDr. C.V. Suresh Babu
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersSourav Singh
 

What's hot (18)

Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Acid Properties In Database Management System
Acid Properties In Database Management SystemAcid Properties In Database Management System
Acid Properties In Database Management System
 
Business intelligence systems
Business intelligence systemsBusiness intelligence systems
Business intelligence systems
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Tps revision 2017
Tps revision 2017Tps revision 2017
Tps revision 2017
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing system
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing system
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Information System within the Organization
Information System within the OrganizationInformation System within the Organization
Information System within the Organization
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousing
 
Introduction to transaction management
Introduction to transaction managementIntroduction to transaction management
Introduction to transaction management
 
Tps presentation
Tps presentationTps presentation
Tps presentation
 
Transaction Processing System
Transaction Processing SystemTransaction Processing System
Transaction Processing System
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswers
 

Similar to Copy of sec d (2)

Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Tps Presentation
Tps PresentationTps Presentation
Tps PresentationApex
 
Different types of data processing
Different types of data processingDifferent types of data processing
Different types of data processingShyam Sunder Budhwar
 
Data Processing and its Types
Data Processing and its TypesData Processing and its Types
Data Processing and its TypesMuhammad Zubair
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
Batch processing in EDA (Event Driven Architectures)
Batch processing in EDA (Event Driven Architectures)Batch processing in EDA (Event Driven Architectures)
Batch processing in EDA (Event Driven Architectures)Álvaro Santuy Elena
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docxKen T
 
Enterprise application characteristics
Enterprise application characteristicsEnterprise application characteristics
Enterprise application characteristicsSalegram Padhee
 
Agent-Based Workflow
Agent-Based WorkflowAgent-Based Workflow
Agent-Based WorkflowLarry Suarez
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lakeCapgemini
 
Informatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptInformatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptCarlCj1
 

Similar to Copy of sec d (2) (20)

Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Tps Presentation
Tps PresentationTps Presentation
Tps Presentation
 
Different types of data processing
Different types of data processingDifferent types of data processing
Different types of data processing
 
Data Processing and its Types
Data Processing and its TypesData Processing and its Types
Data Processing and its Types
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Data Management
Data ManagementData Management
Data Management
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
3 OLAP.pptx
3 OLAP.pptx3 OLAP.pptx
3 OLAP.pptx
 
Batch processing in EDA (Event Driven Architectures)
Batch processing in EDA (Event Driven Architectures)Batch processing in EDA (Event Driven Architectures)
Batch processing in EDA (Event Driven Architectures)
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docx
 
Enterprise application characteristics
Enterprise application characteristicsEnterprise application characteristics
Enterprise application characteristics
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Agent-Based Workflow
Agent-Based WorkflowAgent-Based Workflow
Agent-Based Workflow
 
Tps
TpsTps
Tps
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Informatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptInformatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.ppt
 

More from Vaibhav Kathuria

More from Vaibhav Kathuria (6)

Copy of sec d (2)
Copy of sec d (2)Copy of sec d (2)
Copy of sec d (2)
 
Cursors, triggers, procedures
Cursors, triggers, proceduresCursors, triggers, procedures
Cursors, triggers, procedures
 
Database normalization
Database normalizationDatabase normalization
Database normalization
 
Dbms mca-section a
Dbms mca-section aDbms mca-section a
Dbms mca-section a
 
Relational algebra calculus
Relational algebra  calculusRelational algebra  calculus
Relational algebra calculus
 
B & c
B & cB & c
B & c
 

Copy of sec d (2)

  • 2. Online transaction processing Online transaction processing, or OLTP, is a class of information systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. The term is somewhat ambiguous; some understand a "transaction" in the context of computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions . OLTP has also been used to refer to processing in which the system responds immediately to user requests. An automated teller machine (ATM) for a bank is an example of a commercial transaction processing application. Benefits Online Transaction Processing has two key benefits:  Simplicity and efficiency.  Reduced paper trails  The faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes things simpler for businesses.
  • 3. Online transaction processing Requirements  Online transaction processing increasingly requires support for transactions that span a network and may include more than one company. For this reason, new online transaction processing software uses client or server processing and brokering software that allows transactions to run on different computer platforms in a network.  In large applications, efficient OLTP may depend on sophisticated transaction management software (such as CICS) and/or database optimization tactics to facilitate the processing of large numbers of concurrent updates to an OLTP-oriented database  For even more demanding decentralized database systems, OLTP brokering programs can distribute transaction processing among multiple computers on a network. OLTP is often integrated into service-oriented architecture (SOA) andWeb services.  Online Transaction Processing (OLTP) involves gathering input information, processing the information and updating existing information to reflect the gathered and processed information. As of today, most organizations use a database management system to support OLTP. OLTP is carried in a client server system.
  • 4. Online analytical processing  In computing, online analytical processing, or OLAP , is an approach to answering multi-dimensional analytical (MDA) queries swiftly.  OLAP is part of the broader category of business intelligence, which also encompasses relational database, report writing and data mining.  Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting , financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing)
  • 5. Online analytical processing  OLAP tools enable users to analyze multidimensional data interactively from multiple perspectives.  OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing.  Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region’s sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints.
  • 6. Online analytical processing • Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time . They borrow aspects of navigational databases, hierarchical databases and relational databases. OverviewofOLAPsystems The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a Vector space. The usual interface to manipulate an OLAP cube is a matrix interface like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. • The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. • Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. • A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. • Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a foreign key column to the fact table. This allows an analyst to view the measures along any combination of the dimensions.
  • 8. Transaction processing system • Transaction processing is a style of computing that divides work into individual, indivisible operations, called transactions. A transaction processing system (TPS) or transaction server is a software system, or software/hardware combination, that supports transaction processing. Processingtypes Transaction processing is distinct from other computer processing models — batch processing, time-sharing, and real-time processing Batchprocessing Batch processing is execution of a series of programs (jobs) on a computer without manual intervention. Several transactions, called a batch are collected and processed at the same time. The results of each transaction are not immediately available when the transaction is being entered ; there is a time delay. Real-timeprocessing Real time systems attempt to guarantee an appropriate response to a stimulus or request quickly enough to affect the conditions that caused the stimulus." Each transaction in real-time processing is unique; it is not part of a group of transactions.
  • 9. Transaction processing system Time-sharing Time sharing is the sharing of a computer system among multiple users, usually giving each user the illusion that they have exclusive control of the system. The users may be working on the same project or different projects, but there are usually few restrictions on the type of work each user is doing. Transactionprocessing Transaction processing systems also attempt to provide predictable response times to requests, although this is not as critical as for real-time systems. Rather than allowing the user to run arbitrary programs as time-sharing, transaction processing allows only predefined, structured transactions. Each transaction is usually short duration and the processing activity for each transaction is programmed in advance.
  • 10. Transaction processing system features The following features are considered important in evaluating transaction processing systems • Performance Fast performance with a rapid response time is critical. Transaction processing systems are usually measured by the number of transactions they can process in a given period of time. • Continuous availability The system must be available during the time period when the users are entering transactions. Many organizations rely heavily on their TPS; a breakdown will disrupt operations or even stop the business. • Data integrity The system must be able to handle hardware or software problems without corrupting data. Multiple users must be protected from attempting to change the same piece of data at the same time, for example two operators cannot sell the same seat on an airplane. • Ease of use Often users of transaction processing systems are casual users. The system should be simple for them to understand, protect them from data-entry errors as much as possible, and allow them to easily correct their errors. • Modular growth The system should be capable of growth at incremental costs, rather than requiring a complete replacement. It should be possible to add, replace, or update hardware and software components without shutting down the system.
  • 11. Concurrency: Definition  Concurrency is the ability of a database to allow multiple users to affect multiple transactions. This is one of the main properties that separates a database from other forms of data storage like spreadsheets.  The ability to offer concurrency is unique to databases. Spreadsheets or other flat file means of storage are often compared to databases, but they differ in this one important regard. Spreadsheets cannot offer several users the ability to view and work on the different data in the same file, because once the first user opens the file it is locked to other users. Other users can read the file, but may not edit data.  The problems caused by concurrency are even more important than the ability to support concurrent transactions. For example, when one user is changing data but has not yet saved (committed) that data, then the database should not allow other users who query the same data to view the changed, unsaved data. Instead the user should only view the original data.
  • 12. Concurrency Problems Concurrency problems include: • Lost or buried updates. • Uncommitted dependency (dirty read). • Inconsistent analysis (non-repeatable read). • Phantom reads.
  • 13. Concurrency Problems • Lost Updates Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data. For example,- Two editors make an electronic copy of the same document. Each editor changes the copy independently and then saves the changed copy, thereby overwriting the original document. The editor who saves the changed copy last overwrites changes made by the first editor. This problem could be avoided if the second editor could not make changes until the first editor had finished. • Uncommitted Dependency (Dirty Read) Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row. • For example, An editor is making changes to an electronic document. During the changes, a second editor takes a copy of the document that includes all the changes made so far, and distributes the document to the intended audience. The first editor then decides the changes made so far are wrong and removes the edits and saves the document. The distributed document contains edits that no longer exist, and should be treated as if they never existed. This problem could be avoided if no one could read the changed document until the first editor determined that the changes were final.
  • 14. Concurrency Problems • Inconsistent Analysis (Non-repeatable Read) Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Also, inconsistent analysis involves multiple reads (two or more) of the same row and each time the information is changed by another transaction; thus, the term non-repeatable read. • For example, an editor reads the same document twice, but between each reading, the writer rewrites the document. When the editor reads the document for the second time, it has changed. The original read was not repeatable. This problem could be avoided if the editor could read the document only after the writer has finished writing it. • Phantom Reads Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. Similarly, as the result of an insert by a different transaction, the transaction's second or succeeding read shows a row that did not exist in the original read. • For example, an editor makes changes to a document submitted by a writer, but when the changes are incorporated into the master copy of the document by the production department, they find that new unedited material has been added to the document by the author. This problem could be avoided if no one could add new material to the document until the editor and production department finish working with the original document.
  • 15. Concurrency Control • Lock-Based Protocols • Timestamp-Based Protocols • Validation-Based Protocols • Multiple Granularity • Multiversion Schemes • Insert and Delete Operations
  • 16. Lock-Based Protocols • A lock is a mechanism to control concurrent access to a data item • Data items can be locked in two modes : 1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction. 2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction. • Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted.
  • 17. Lock-Based Protocols (Cont.) • Lock-compatibility matrix A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item. If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.
  • 18. Lock-Based Protocols (Cont.) • Example of a transaction performing locking: T2: lock-S(A); read (A); unlock(A); lock-S(B); read (B); unlock(B); display(A+B) • Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read of A and B, the displayed sum would be wrong. • A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules.
  • 19. Pitfalls of Lock-Based Protocols • Consider the partial schedule Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A. Such a situation is called a deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its locks released.
  • 20. Pitfalls of Lock-Based Protocols (Cont.) • The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil. • Starvation is also possible if concurrency control manager is badly designed. For example: • A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same item. • The same transaction is repeatedly rolled back due to deadlocks. • Concurrency control manager can be designed to prevent starvation
  • 21. The Two-Phase Locking Protocol • This is a protocol which ensures conflict-serializable schedules. • Phase 1: Growing Phase • transaction may obtain locks • transaction may not release locks • Phase 2: Shrinking Phase • transaction may release locks • transaction may not obtain locks The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock).
  • 22. The Two-Phase Locking Protocol (Cont.) • Two-phase locking does not ensure freedom from deadlocks • Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts. • Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit • There can be conflict serializable schedules that cannot be obtained if two-phase locking is used. • However, in the absence of extra information (e.g., ordering of access to data), two-phase locking is needed for conflict serializability in the following sense: Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not conflict serializable.
  • 23. Multiple Granularity • Granularity is the extent to which a system is broken down into small parts, either the system itself or its description or observation. It is the extent to which a larger entity is subdivided. For example, a yard broken into inches has finer granularity than a yard broken into feet. • Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones • Can be represented graphically as a tree (but don't confuse with tree-locking protocol) • When a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents in the same mode.
  • 24. Example of Granularity Hierarchy The levels, starting from the coarsest (top) level are database area file record
  • 25. Intention Lock Modes • In addition to S and X lock modes, there are three additional lock modes with multiple granularity: • intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks. • intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks • shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks. • intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.
  • 26. Deadlock Handling • A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. • In an operating system, a deadlock is a situation which occurs when a process or threadenters a waiting state because a resource requested is being held by another waiting process, which in turn is waiting for another resource. If a process is unable to change its state indefinitely because the resources requested by it are being used by another waiting process, then the system is said to be in a deadlock System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set.
  • 27. Deadlock Handling • Deadlock prevention protocols ensure that the system will never enter into a deadlock state. • Some prevention strategies : • Require that each transaction locks all its data items before it begins execution (pre declaration). • Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order (graph-based protocol). • Timeout-Based Schemes : • a transaction waits for a lock only for a specified amount of time. After that, the wait times out and the transaction is rolled back. • thus deadlocks are not possible • simple to implement; but starvation* is possible. Also difficult to determine good value of the timeout interval. * Starvation is a severe deficiency in caloric energy, nutrient, and vitamin intake. It is the most extreme form of malnutrition
  • 28. Deadlock Detection • After a deadlock is detected, it can be corrected by using one of the following methods: • Process Termination: One or more process involved in the deadlock may be aborted. We can choose to abort all processes involved in the deadlock. This ensures that deadlock is resolved with certainty and speed. But the expense is high as partial computations will be lost. Or, we can choose to abort one process at a time until the deadlock is resolved. This approach has high overheads because after each abort an algorithm must determine whether the system is still in deadlock. Several factors must be considered while choosing a candidate for termination, such as priority and age of the process. • Resource Preemption: Resources allocated to various processes may be successively preempted and allocated to other processes until the deadlock is broken.
  • 29. Deadlock Detection (Cont.) Wait-for graph without a cycle Wait-for graph with a cycle
  • 30. Deadlock Recovery • When deadlock is detected : • Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost. • Rollback -- determine how far to roll back transaction • Total rollback: Abort the transaction and then restart it. • More effective to roll back transaction only as far as necessary to break deadlock. • Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation
  • 31. Other Approaches to Concurrency Control • Timestamp-Based Protocols • Each transaction is issued a timestamp when it enters the system. If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj). • The protocol manages concurrent execution such that the time-stamps determine the serializability order. • In order to assure such behavior, the protocol maintains for each data Q two timestamp values: • W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q) successfully. • R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q) successfully.
  • 32. Timestamp-Based Protocols • The timestamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. • Suppose a transaction Ti issues a read(Q) 1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten.  Hence, the read operation is rejected, and Ti is rolled back. 2. If TS(Ti)W-timestamp(Q), then the read operation is executed, and R-timestamp( Q) is set to max(R-timestamp(Q), TS(Ti)). • Suppose that transaction Ti issues write(Q). 1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed that that value would never be produced.  Hence, the write operation is rejected, and Ti is rolled back. 2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.  Hence, this write operation is rejected, and Ti is rolled back. 3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
  • 33. Correctness of Timestamp-Ordering Protocol • The timestamp-ordering protocol guarantees serializability since all the arcs in the precedence graph are of the form: Thus, there will be no cycles in the precedence graph • Timestamp protocol ensures freedom from deadlock as no transaction ever waits. • But the schedule may not be cascade-free, and may not even be recoverable.
  • 34. DCL commands are used to enforce database security in a multiple user database environment. Two types of DCL commands are GRANT and REVOKE. Only Database Administrator's or owner's of the database object can provide/remove privileges on a databaseobject • SQL GRANT Command SQL GRANT is a command used to provide access or privileges on the database objects to the users The Syntax for the GRANT command is:
  • 35. • SQL GRANT Command • privilege_name is the access right or privilege granted to the user. Some of the access rights are ALL, EXECUTE, and SELECT. • object_name is the name of an database object like TABLE, VIEW, STORED PROC and SEQUENCE. • user_name is the name of the user to whom an access right is being granted. • user_name is the name of the user to whom an access right is being granted. • PUBLIC is used to grant access rights to all users. • ROLES are a set of privileges grouped together. • WITH GRANT OPTION - allows a user to grant access rights to other users.
  • 36. SQL GRANT Command • For Example: GRANT SELECT ON employee TO user1;This command grants a SELECT permission on employee table to user1.You should use the WITH GRANT option carefully because for example if you GRANT SELECT privilege on employee table to user1 using the WITH GRANT option, then user1 can GRANT SELECT privilege on employee table to another user, such as user2 etc. Later, if you REVOKE the SELECT privilege on employee from user1, still user2 will have SELECT privilege on employee table.
  • 37. SQL REVOKE Command : • The REVOKE command removes user access rights or privileges to the database objects. • The Syntax for the REVOKE command is: For Example: REVOKE SELECT ON employee FROM user1;This command will REVOKE a SELECT privilege on employee table from user1.When you REVOKE SELECT privilege on a table from a user, the user will not be able to SELECT data from that table anymore. However, if the user has received SELECT privileges on that table from more than one users, he/she can SELECT from that table until everyone who granted the permission revokes it. You cannot REVOKE privileges if they were not initially granted by you.
  • 38. Privileges and Roles: Privileges: Privileges defines the access rights provided to a user on a database object. There are two types of privileges. 1) System privileges - This allows the user to CREATE, ALTER, or DROP database objects. 2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or DELETE data from database objects to which the privileges apply. Few CREATE system privileges are listed below: System Privileges Description CREATE object allows users to create the specified object in their own schema. CREATE ANY object allows users to create the specified object in any schema.
  • 39. The above rules also apply for ALTER and DROP system privileges. Few of the object privileges are listed below: Object Privileges Description INSERT allows users to insert rows into a table. SELECT allows users to select data from a database object. UPDATE allows user to update data in a table. EXECUTE allows user to execute a stored procedure or a function.
  • 40. Roles: Roles are a collection of privileges or access rights. When there are many users in a database it becomes difficult to grant or revoke privileges to users. Therefore, if you define roles, you can grant or revoke privileges to users, thereby automatically granting or revoking privileges. You can either create Roles or use the system roles pre-defined by oracle. Some of the privileges granted to the system roles are as given below: System Role Privileges Granted to the Role CONNECT CREATE TABLE, CREATE VIEW, CREATE SYNONYM, CREATE SEQUENCE, CREATE SESSION etc. RESOURCE CREATE PROCEDURE, CREATE SEQUENCE, CREATE TABLE, CREATE TRIGGER etc. The primary usage of the RESOURCE role is to restrict access to database objects. DBA ALL SYSTEM PRIVILEGES
  • 41. • Creating Roles: The Syntax to create a role is: It's easier to GRANT or REVOKE privileges to the users through a role rather than assigning a privilege directly to every user. If a role is identified by a password, then, when you GRANT or REVOKE privileges to the role, you definitely have to identify it with the password.
  • 42. Database Recovery There are many situations in which a transaction may not reach a commit or abort point. An operating system crash can terminate the DBMS processes • The DBMS can crash • The process issuing the transaction can abnormally terminate • The system might lose power • A disk may fail or other hardware may fail. • Human error can result in deletion of critical data. • In any of these situations, data in the database may become inconsistent or lost.
  • 43. Two general classes of failures: • Process or Processor failures in which no data is lost but the database must be restarted and recovered. • Storage failure wherein data on disk is lost or corrupted. In such cases, data must be restored from backup and then the database must be restarted and recovered. Database Recovery is the process of restoring the database and the data to a consistent state. This may include restoring lost data up to the point of the event (e.g. system crash). There are many different approaches to recover a database. Here we discuss Manual Reprocessing, and a variety of Automated recovery techniques.
  • 44. Manual Reprocessing • In a Manual Reprocessing approach, the database is periodically backed up (a database save) and all transactions applied since the last save are recorded • If the system crashes, the latest database save is restored and all of the transactions are re-applied (by users) to bring the database back up to the point just before the crash. • Several shortcomings: • Time required to re-apply transactions • Transactions might have other (physical) consequences • Re-applying concurrent transactions is not straight forward.
  • 45. Automated Recovery There are several types of Automated Recovery techniques including: Deferred update and immediate update, shadow paging, etc. Recall, a transaction can be in one of the following states: • Active - when the transaction just begins • Partially Committed - after the last operation has completed (but before the commit point is reached) • Failed - Normal operation is prevented (e.g., in the event of a system crash) • Aborted - Transaction is rolled back. That is, all of its effects are undone • Committed - Transaction completes all operations and moved the database to the next consistent state
  • 46. Deferred Update Recovery • Also called NO-UNDO/REDO During a transaction, only record the changes to data items in the log. When the transaction commits, actually update the data items on disk. Two main rules: A transaction cannot change any items in the database until it commits. • A transaction may not commit until all of the write operations are successfully recorded in the log. This means that we must check to see that the log is actually written to disk. Since T1 and T4 committed, their changes were written to disk However, T2 and T3 did not commit, hence their changes were not written to disk To recover, we simply ignore those transactions that did not commit. Advantages: Recovery is made easier: Any transaction that reached the commit point (from the log) has its writes applied to the database (REDO). All other transactions are ignored. Cascading rollback does not occur because no other transaction sees the work of another until it is committed (no stale reads).
  • 47. Immediate Update Recovery • Immediate Update applies the write operations to the database as the transaction is executing (as opposed to at the end as with deferred update). Writes are still saved in the log before being applied to the database - a write-ahead log (WAL) • Maintain two logs: • REDO log: A record of each new data item in the database. • UNDO log: A record of each updated data item (old values). • Two rules: • Transaction T may not update the database until all UNDO entries have been written to the UNDO log. • Transaction T is not allowed to commit until all REDO and UNDO log entries are written (forced-written to disk).
  • 48. • To Recover: • Begin at the end of the log and read backwards to the last chechpoint. Create two lists: C - transactions that have committed NC - transactions that did not commit • For each transaction T in C, make a third list of transactions, RB, that have read items that were updated by NC transactions during T. • UNDO: Undo the effects of the transactions in NC and RB. • REDO: Apply the effects of all of the transactions that did commit (all T in list C) Advantages: • Immediate update allows higher concurrency because transactions write continuously to the database rather than waiting until the commit point. Disadvantages: • Step 2 above can lead to cascading rollbacks - time consuming and may be problematic.
  • 49. OLAP (online analytical processing) • OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into sub-attributes.
  • 50. OLAP (online analytical processing) • OLAP can be used for data mining or the discovery of previously undiscerned relationships between data items. An OLAP database does not need to be as large as a data warehouse, since not all transactional data is needed for trend analysis. Using Open Database Connectivity (ODBC), data can be imported from existing relational databases to create a multidimensional database for OLAP. • Online Analytical Processing, a category of software tools that provides analysis of data stored in a database. OLAP tools enable users to analyze different dimensions of multidimensional data. For example, it provides time series and trend analysis views. OLAP often is used in data mining. • The chief component of OLAP is the OLAP server, which sits between a client and a database management systems (DBMS). The OLAP server understands how data is organized in the database and has special functions for analyzing the data. There are OLAP servers available for nearly all the major database systems.
  • 51. OLTP vs. OLAP • OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). • OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
  • 52. The following table summarized main differences between OLPT and OLAP OLTP System Online Transaction Processing (Operational System) OLAP System Online Analytical Processing (Data Warehouse) Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
  • 53. What Is a Data Warehouse? • A data warehouse is a relational or multidimensional database that is designed for query and analysis. Data warehouses are not optimized for transaction processing, which is the domain of OLTP systems. Data warehouses usually consolidate historical and analytic data derived from multiple sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources. • A data warehouse usually stores many months or years of data to support historical analysis. The data in a data warehouse is typically loaded through an extraction, transformation, and loading (ETL) process from one or more data sources such as OLTP applications, mainframe applications, or external data providers. • Users of the data warehouse perform data analyses that are often time-related. Examples include consolidation of last year's sales figures, inventory analysis, and profit by product and by customer
  • 54. The Key Characteristics of a Data Warehouse • The key characteristics of a data warehouse are as follows: • Some data is denormalized for simplification and to improve performance • Large amounts of historical data are used • Queries often retrieve large amounts of data • Both planned and ad hoc queries are common • The data load is controlled • In general, fast query performance with high data throughput is the key to a successful data warehouse.
  • 55. Common Oracle Data Warehousing Tasks • As an Oracle data warehousing administrator or designer, you can expect to be involved in the following tasks: • Configuring an Oracle database for use as a data warehouse • Designing data warehouses • Performing upgrades of the database and data warehousing software to new releases • Managing schema objects, such as tables, indexes, and materialized views • Managing users and security • Developing routines used for the extraction, transformation, and loading (ETL) processes • Creating reports based on the data in the data warehouse • Backing up the data warehouse and performing recovery when necessary • Monitoring the data warehouse's performance and taking preventive or corrective action as required • In a small-to-midsize data warehouse environment, you might be the sole person performing these tasks. In large, enterprise environments, the job is often divided among several DBAs and designers, each with their own specialty, such as database security or database tuning.
  • 56. Characteristics of Data Warehouses • Data warehouses have several distinguishing characteristics. • These systems combine data from multiple sources. Operational systems such as ERP systems provide production data, financial systems supply revenue and expense data, and human resource systems present employee data. • The data copied into a data warehouse does not change (except to correct errors). The data warehouse is a historical record of the state of an organization. The frequent changes of the source OLTP systems are reflected in the data warehouse by adding new data, not by changing existing data. • Data warehouses are subject oriented, that is, they focus on measuring entities, such as sales, inventory, and quality. OLTP systems, by contrast, are function oriented and focus on operations such as order fulfillment. • In data warehouses, data from distinct function-oriented systems is integrated to provide a single view of an operational entity. • Data warehouses are designed for business users, not database programmers, so they are easy to understand and query. • Data warehouses are designed for speed. Users interact with data warehouses—running queries, analyzing data, formulating new queries based on information gleaned from earlier queries—so the systems must be responsive.
  • 57. Metadata • Metadata is "data about data". The term is ambiguous, as it is used for two fundamentally different concepts (types). Structural metadata is about the design and specification of data structures and is more properly called "data about the containers of data"; descriptive metadata, on the other hand, is about individual instances of application data, the data content. Metadata (metacontent) are defined as the data providing information about one or more aspects of the data, such as: • Means of creation of the data • Purpose of the data • Time and date of creation • Creator or author of the data • Location on a computer network where the data were created • Standards used
  • 58. Metadata types The metadata application is many fold covering a large variety of fields of application there are nothing but specialized and well accepted models to specify types of metadata  Structural metadata are used to describe the structure of database objects such as tables, columns, keys and indexes.  Guide metadata are used to help humans find specific items and are usually expressed as a set of keywords in a natural language  Technical metadata correspond to internal metadata, business metadata - to external metadata. Kimball adds a third category named  Process metadata. On the other hand, NISO distinguishes among three types of metadata: descriptive, structural and administrative.[  Descriptive metadata are the information used to search and locate an object such as title, author, subjects, keywords, publisher;  Structural metadata give a description of how the components of the object are organised  Administrative metadata refer to the technical information including file type. Two sub-types of administrative metadata are rights management metadata and preservation metadata.
  • 59. OLAP Metadata Model • The basic data model in a relational database is a table composed of one or more columns of data. All of the data is stored in columns. In contrast, the basic data model for multidimensional analysis is a cube, which is composed of Measures, Dimensions, and Attributes. • The OLAP Catalog metadata informs applications about the data that is available within the database. The application can then define multidimensional objects to represent that data. When the application runs, it instantiates these objects and populates them with data.
  • 60. Mapping Data Objects to Metadata Objects The objects comprising a data warehouse and Oracle OLAP metadata use different data structures. The data objects in your data warehouse are represented to the OLAP metadata catalog in the following relational objects, regardless of whether the data is actually stored in relational tables or workspace variables: • Fact Tables or Views • Level-based dimension Tables or Views Oracle OLAP metadata catalog maps the data warehouse schema to these multidimensional data objects: • Measures • Dimensions • Dimension attributes • Levels • Level attributes • Hierarchies • Cubes • Measure folders
  • 61. Measures  Measures are the same as facts. The term "fact" is typically used in relational databases, and the term "measure" is typically used in multidimensional applications.  Measures are thus located in fact tables. A fact table has columns that store measures (or facts) and foreign key columns that create the association with dimension tables.  Measures contain the data that you wish to analyze, such as Sales or Cost. OLAP Catalog metadata requires that a column have a numerical or a date data type to be identified as a measure. Most frequently, a measure is numerical and additive.
  • 62. Dimensions • Dimensions identify and categorize your data. Dimension members are stored in a dimension table. Each column represents a particular level in a hierarchy. In a star schema, the columns are all in the same table; in a snowflake schema, the columns are in separate tables for each level. • Because measures are typically multidimensional, a single value in a measure must be qualified by a member of each dimension to be meaningful. For example theunit_cost measure has two dimensions: products_dim and times_dim. A value of unit_cost (21.60) is only meaningful when it is qualified by a specific product code (1575) and a time period (28-jan-1998).
  • 63. Attributes Attributes provide descriptive information about the data and are typically used for display. Level Attributes Level attributes provide supplementary information about the dimension members at a particular level of a dimension hierarchy. The dimension members themselves may be meaningless, such as a value of "1296" for a time period. These cryptic values for dimension members are used internally for selecting and sorting quickly, but are meaningless to users. For example- You might have columns for employee number (ENUM), last name (LAST_NAME), first name (FIRST_NAME), and telephone extension (TELNO). ENUM is the best choice for a level column, since it is a key column and its values uniquely identify the employees. ENUM also has a NUMBER data type, which makes it more efficient than a text column for the creation of indexes. LAST_NAME, FIRST_NAME, and TELNO are attributes. Even though they are dimensioned by ENUM, they do not make suitable measures because they are descriptive text rather than business measurements
  • 64. Dimension Attributes • Dimension attributes specify groupings of level attributes for a specific dimension. Whereas level attributes map to specific data values, dimension attributes are purely logical metadata objects. • An example of a dimension attribute is end date, which is required for time dimensions. If a time dimension has month, quarter, and year levels, end date identifies the last date of each month, each quarter, and each year. Within a relational schema, the three level attributes that make up the end date dimension attribute would be stored in columns with names like month_end_date, quarter_end_date, and year_end_date.
  • 65. Measure Folders • Measures can be organized within measure folders, which facilitate the browsing of data by business area. Measure folders are also known as catalogs. • Whereas dimensions and measures are associated with the schemas that contain their source data, measure folders are schema independent. Each OLAP client can view all measure folders defined within the Oracle instance.
  • 66. ETL -Extract, Transform, Load • ETL is short for extract, transform, load Three database functions that are combined into one tool to pull data out of one database and place it into another database. • Extract is the process of reading data from a database. • Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data. • Load is the process of writing the data into the target database. • ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.
  • 67. ETL - Extract, Transform, Load • In managing databases, extract, transform, load (ETL) refers to three separate functions combined into a single programming tool. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data - using rules or lookup tables, or creating combinations with other data - to convert it to the desired state. Finally, the load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist. • ETL can be used to acquire a temporary subset of data for reports or other purposes, or a more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse; conversion from one database type to another; and the migration of data from one database or platform to another
  • 68. Dimensional modeling • Dimensional modeling (DM) is the name of a set of techniques and concepts used in data warehouse design. It is considered to be different from entity-relationship modeling (ER). Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. • DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance • ER is very useful for the transaction capture, it should be avoided for end-user delivery. • Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. • Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across subject areas
  • 69. Dimensional modeling process • The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used: • Choose the business process • Declare the grain • Identify the dimensions • Identify the fact Choose the business process – The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of the data warehouse. The basics in the design build on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML).
  • 70. Dimensional modeling process • Declare the grain After describing the Business Process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore the grain (sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is supposed to be able to deliver. • Identify the dimensions The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday.
  • 71. Dimensional modeling process • Identify the facts After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.
  • 72. Dimensional modeling process Benefits of the dimensional modeling are following: • Understandability - Compared to the normalized model, the dimensional model is easier to understand and more intuitive. In dimensional models, information is grouped into coherent business categories or dimensions, making it easier to read and interpret. Simplicity also allows software to navigate databases efficiently. In normalized models, data is divided into many discrete entities and even a simple business process might result in dozens of tables joined together in a complex way. • Query performance - Dimensional models are more denormalized and optimized for data querying, while normalized models seek to eliminate data redundancies and are optimized for transaction loading and updating. The predictable framework of a dimensional model allows the database to make strong assumptions about the data that aid in performance. Each dimension is an equivalent entry point into the fact table, and this symmetrical structure allows effective handling of complex queries. Query optimization for star join databases is simple, predictable, and controllable. • Extensibility - Dimensional models are extensible and easily accommodate unexpected new data. Existing tables can be changed in place either by simply adding new data rows into the table or executing SQL alter table commands. No queries or other applications that sit on top of the Warehouse need to be reprogrammed to accommodate changes. Old queries and applications continue to run without yielding different results. But in normalized models each modification should be considered carefully, because of the complex dependencies between database tables.
  • 73. Fact Tables, Dimension Tables • In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. • A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non additive, and semi additive measures. • Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. • Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (a region contains many stores).
  • 74. Types of fact tables There are basically three fundamental measurement events, which characterizes all fact tables. • Transactional A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is usually specified as "one row per line in a transaction", e.g., every line on a receipt. Typically a transactional fact table holds data of the most detailed level, causing it to have a great number of dimensions associated with it. • Periodic snapshots The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any defined period of time, e.g. a performance summary of a salesman over the previous month. A periodic snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional fact table in order to deliver the chosen performance output. • Accumulating snapshots This type of fact table is used to show the activity of a process that has a well-defined beginning and end, e.g., the processing of an order. An order moves through specific steps until it is fully processed. As steps towards fulfilling the order are completed, the associated row in the fact table is updated. An accumulating snapshot table often has multiple date columns, each representing a milestone in the process. Therefore, it's important to have an entry in the associated date dimension that represents an unknown date, as many of the milestone dates are unknown at the time of the creation of the row.
  • 75. Dimension table In data warehousing, a dimension table is one of the set of companion tables to a fact table. The fact table contains business facts (or measures), and foreign keys which refer to candidate keys (normally primary keys) in the dimension tables. Contrary to fact tables, dimension tables contain descriptive attributes (or fields) that are typically textual fields (or discrete numbers that behave like text). These attributes are designed to serve two critical purposes: query constraining and/or filtering, and query result set labeling. Dimension attributes should be: • Verbose (labels consisting of full words) • Descriptive • Complete (having no missing values) • Discretely valued (having only one value per dimension table row) • Quality assured (having no misspellings or impossible values)
  • 76. • Dimension table rows are uniquely identified by a single key field. It is recommended that the key field be a simple integer because a key value is meaningless, used only for joining fields between the fact and dimension tables. The use of surrogate dimension keys brings several advantages, including • Performance. Join processing is made much more efficient by using a single field (the surrogate key) • Buffering from operational key management practices. This prevents situations where removed data rows might reappear when their natural keys get reused or reassigned after a long period of dormancy • Mapping to integrate disparate sources • Handling unknown or not-applicable connections • Tracking changes in dimension attribute values
  • 77. The goal of a dimension table is to create standardized, conformed dimensions that can be shared across the enterprise's data warehouse environment, and enable joining to multiple fact tables representing various business processes. Conformed dimensions are important to the enterprise nature of DW/BI systems because they promote: • Consistency- Every fact table is filtered consistently, so that query answers are labeled consistently. • Integration-Queries can drill into different process fact tables separately for each individual fact table, then join the results on common dimension attributes. • Reduced development time to market. The common dimensions are available without recreating them
  • 78. Introduction to Cubes • Cubes are the main objects in online analytic processing (OLAP), a technology that provides fast access to data in a data warehouse. A cube is a set of data that is usually constructed from a subset of a data warehouse and is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. • A cube provides an easy-to-use mechanism for querying data with quick and uniform response times. End users use client applications to connect to an Analysis server and query the cubes on the server. In most client applications, end users issue a query on a cube by manipulating the user interface controls, which determine the contents of the query. This spares end users from writing language-based queries . Precalculated summary data called aggregations provides the mechanism for rapid and uniform response times to queries. Aggregations are created for a cube before end users access it. The results of a query are retrieved from the aggregations, the cube's source data in the data warehouse, a copy of this data on the Analysis server, the client cache, or a combination of these sources. An Analysis server can support many different cubes, such as a cube for sales, a cube for inventory, a cube for customers, and so on.
  • 79. Introduction to Cubes • Every cube has a schema, which is the set of joined tables in the data warehouse from which the cube draws its source data. The central table in the schema is the fact table, the source of the cube's measures. The other tables are dimension tables, the sources of the cube's dimensions. For more information about schemas, see Cube Structure. • A cube is defined by the measures and dimensions that it contains. For example, a cube for sales analysis includes the measures Item_Sale_Price and Item_Cost and the dimensions Store_Location, Product_Line, and Fiscal_Year. This cube enables end users to separate Item_Sale_Price and Item_Cost into various categories by Store_Location, Product_Line, and Fiscal_Year. • Each cube dimension can contain a hierarchy of levels to specify the categorical breakdown available to end users. For example, the Store_Location dimension includes the level hierarchy: Continent, Country, State_Province, City, Store_Number. Each level in a dimension is of finer granularity than its parent. For example, continents contain countries or regions, and states or provinces contain cities. Similarly, the hierarchy of the Fiscal_Year dimension includes the levels Year, Quarter, Month, and Day.
  • 80. OLAP cube • An OLAP cube is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications. • An OLAP cube is a method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data (measures) are categorized by dimensions. OLAP cubes are often pre-summarized across dimensions to drastically improve query time over relational databases. • Although it stores data like a traditional database does, an OLAP cube is structured very differently. Databases, historically, are designed according to the requirements of the IT systems that use them. OLAP cubes, however, are used by business users for advanced analytics. Thus, OLAP cubes are designed using business logic and understanding. They are optimized for analytical purposes, so that they can report on millions of records at a time. Business users can query OLAP cubes using plain English.
  • 81. Dimension Hierarchies • A hierarchy defines relationships among a set of attributes that are grouped by levels in the dimension of a cube model. These relationships between levels are usually defined with a functional dependency to improve optimization. Multiple hierarchies can be defined for a dimension of a cube model. • Hierarchies document the relationships between levels in a dimension. When you define a hierarchy, DB2 Data Warehouse Edition creates a functional dependency by default between consecutive levels, such that the level key attributes from one level are functionally dependent on the level key attributes in the lower level. For example, in a Region hierarchy that is defined with the following levels: Region, State, and City, DB2 Data Warehouse Edition creates two functional dependencies. One functional dependency to show that City determines State, and a second functional dependency to show that state determines region
  • 82. All level for the hierarchies in the following types of dimensions: Time dimensions- Aggregating your fact data across all of time is rarely useful, unless you know that the fact table contains data for a specific period of time, such as the year 2005. Scenario dimensions. A scenario dimension describes likely ways of analyzing your data. You might have two members, such as Actual and Budget, in your scenario dimension. In this case, it does not make sense to combine the data values for budget and actual into a combined aggregation. The hierarchy type describes the relationship among the levels within the hierarchy. The following four hierarchy types are supported:
  • 83. Balanced • A hierarchy with meaningful levels and branches that have a consistent depth. Each level's logical parent is in the level directly above it. A balanced hierarchy can represent time where the meaning and depth of each level, such as Year, Quarter, and Month, is consistent. They are consistent because each level represents the same type of information, and each level is logically equivalent. Figure 1 shows an example of a balanced time hierarchy. • Figure 1. Balanced hierarchy. Example of a balanced hierarchy
  • 84. Unbalanced • A hierarchy with levels that have a consistent parent-child relationship but have a logically inconsistent levels. The hierarchy branches also can have inconsistent depths. An unbalanced hierarchy can represent an organization chart. For example, Figure 2 shows a chief executive officer (CEO) on the top level of the hierarchy and at least two of the people that might branch off below including the chief operating officer and the executive secretary. The chief operating officer has more people branching off also, but the executive secretary does not. The parent-child relationships on both branches of the hierarchy are consistent. However, the levels of both branches are not logical equivalents. For example, an executive secretary is not the logical equivalent of a chief operating officer. Figure 2. Unbalanced hierarchy. Example of an unbalanced hierarchy
  • 85. Ragged A hierarchy in which each level has a consistent meaning, but the branches have inconsistent depths because at least one member attribute in a branch level is unpopulated. A ragged hierarchy can represent a geographic hierarchy in which the meaning of each level such as city or country is used consistently, but the depth of the hierarchy varies. Figure 3 shows a geographic hierarchy that has Continent, Country, Province/State, and City levels defined. One branch has North America as the Continent, United States as the Country, California as the Province or State, and San Francisco as the City. However, the hierarchy becomes ragged when one member does not have an entry at all of the levels. For example, another branch has Europe as the Continent, Greece as the Country, and Athens as the City, but has no entry for the Province or State level because this level is not applicable to Greece for the business model in this example. In this example, the Greece and United States branches descend Figure 3. Ragged hierarchy. Example of a ragged hierarchy to different depths, creating a ragged hierarchy.
  • 86. Network A hierarchy in which the order of levels is not specified, but in which levels do have semantic meaning. For example,Figure 4 shows a network hierarchy that describes product attributes such as Color, Size, and PackageType. Because the levels do not have an inherent parent-child relationship, the order of the levels is not important. A widget company might have member entries like white for Color, small for Size, and shrink wrap for PackageType. A second member entry might be red for Color, large for Size, and box for PackageType. Figure 4. Network hierarchy. Example of a network hierarchy
  • 87. Advantages are there from developing a Data Warehouse? The three major advantages are: 1. Integrating data from multiple sources; 2. Performing new types of analyses; and 3. Reducing cost to access historical data. Other benefits may include: 1. Standardizing data across the organization, a "single version of the truth"; 2. Improving turnaround time for analysis and reporting; 3. Sharing data and allowing others to easily access data; 4. Supporting ad hoc reporting and inquiry; 5. Reducing the development burden on IS/IT; and 6. Removing informational processing load from transaction-oriented databases;
  • 88. Advantage of data marts. It is easy to access frequently needed data from the database when required by the client. We can give access to group of users to view the Datamart when it is required. Of course performance will be good. It is easy to maintain and to create the datamart. It will be related to specific business. And It is low cost to create a datamart rather than creating datarehouse with a huge space.
  • 89. Difference between Data Warehouse and Data mart • A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales or Finance or Marketing. Data marts are often built and controlled by a single department within an organization. Given their single-subject focus, data marts usually draw data from only a few sources. The sources could be internal operational systems, a central data warehouse, or external data. How Is It Different from a DataWarehouse? • A data warehouse, in contrast, deals with multiple subject areas and is typically implemented and controlled by a central organizational unit such as the Corporate Information Technology (IT) group. Often, it is called a central or enterprise data warehouse. Typically, a data warehouse assembles data from multiple source systems. • Nothing in these basic definitions limits the size of a data mart or the complexity of the decision-support data that it contains. Nevertheless, data marts are typically smaller and less complex than data warehouses; hence, they are typically easier to build and maintain. The following table summarizes the basic differences between a data warehouse and a data mart:
  • 90. Difference between Data Warehouse and Data mart Parameters Data Warehouse Data Mart Scope Corporate Line-of-Business (LoB) Subjects Multiple Single Subject Data Sources Many Few Size (typical) 100 GB-TB+ < 100GB Implementation Time Months to years Months
  • 91. DATA WAREHOUSE REPORTING TOOLS REPORTING TOOLS • connect to one or multiple databases in a single report • defining queries on the data warehouse database • loading millions of records for aggregation and analysis (big data) • creating simple and complex filters and sorts • create a pixel perfect report including a variety of formatting options • conditional formatting • automatic summarization of the measures: average, median, count, sum and so forth. • display pivot tables, simple lists or summarized tables • creating sections guided by a dimension • presenting data in many type of graphs
  • 92. DATA WAREHOUSE REPORTING TOOLS ANALYSIS TOOLS • Clicking on data items to drill-down into the details • one click graphs • slicing and dicing – switching easily dimensions • nesting dimensions • sorting and filtering • conditional formatting • extreme performance getting answers instantly • automatic summarization of the measures
  • 93. Query and Reporting Tools The data warehouse is accessed using an end-user query and reporting tool from Business Objects. Business Objects provides several tools to securely access the data warehouse or personal data files with a point-and-click interface including the following: • BusinessObjects -(Reporter and Explorer) ? a Microsoft Windows based query and reporting tool. • InfoView - a web based tool, that allows reports to be refreshed on demand (but can not create new reports). • InfoBurst - a web based server tool, that allows reports to be refreshed, scheduled and distributed. It can be used to distribute reports and data to users or servers in various formats (e.g. Text, Excel, PDF, HTML, etc) • Data Warehouse List Upload - a web based tool, that allows lists of data to be uploaded into the data warehouse for use as input to queries.