Copy of sec d (2)

Online transaction processing
Online transaction processing, or OLTP, is a class of information systems that facilitate
and manage transaction-oriented applications, typically for data entry and
retrieval transaction processing. The term is somewhat ambiguous; some understand
a "transaction" in the context of computer or database transactions, while others
(such as the Transaction Processing Performance Council) define it in terms of
business or commercial transactions . OLTP has also been used to refer to processing
in which the system responds immediately to user requests. An automated teller
machine (ATM) for a bank is an example of a commercial transaction processing
application.
Benefits
Online Transaction Processing has two key benefits:
 Simplicity and efficiency.
 Reduced paper trails
 The faster, more accurate forecasts for revenues and expenses are both examples
of how OLTP makes things simpler for businesses.

Online transaction processing Requirements

Online transaction processing increasingly requires support for transactions that span a
network and may include more than one company. For this reason, new online
transaction processing software uses client or server processing and brokering
software that allows transactions to run on different computer platforms in a network.
 In large applications, efficient OLTP may depend on sophisticated transaction
management software (such as CICS) and/or database optimization tactics to facilitate
the processing of large numbers of concurrent updates to an OLTP-oriented database
 For even more demanding decentralized database systems, OLTP brokering programs
can distribute transaction processing among multiple computers on a network. OLTP is
often integrated into service-oriented architecture (SOA) andWeb services.
 Online Transaction Processing (OLTP) involves gathering input information, processing
the information and updating existing information to reflect the gathered and
processed information. As of today, most organizations use a database management
system to support OLTP. OLTP is carried in a client server system.

Online analytical processing
 In computing, online analytical processing, or OLAP , is an approach to
answering multi-dimensional analytical (MDA) queries swiftly.
 OLAP is part of the broader category of business intelligence, which also
encompasses relational database, report writing and data mining.
 Typical applications of OLAP include business reporting for sales, marketing,
management reporting, business process management (BPM), budgeting and
forecasting , financial reporting and similar areas, with new applications coming
up, such as agriculture.
The term OLAP was created as a slight modification of the traditional database
term OLTP (Online Transaction Processing)

 OLAP tools enable users to analyze multidimensional data interactively from multiple
perspectives.
 OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down,
and slicing and dicing.
 Consolidation involves the aggregation of data that can be accumulated and computed
in one or more dimensions.
For example, all sales offices are rolled up to the sales department or sales division to
anticipate sales trends. By contrast, the drill-down is a technique that allows users to
navigate through the details. For instance, users can view the sales by individual products
that make up a region’s sales. Slicing and dicing is a feature whereby users can take out
(slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different
viewpoints.

• Databases configured for OLAP use a multidimensional data model, allowing for complex
analytical and ad hoc queries with a rapid execution time . They borrow aspects
of navigational databases, hierarchical databases and relational databases.
OverviewofOLAPsystems
The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or
a hypercube). It consists of numeric facts called measures which are categorized by dimensions.
The measures are placed at the intersections of the hypercube, which is spanned by the
dimensions as a Vector space. The usual interface to manipulate an OLAP cube is a matrix
interface like Pivot tables in a spreadsheet program, which performs projection operations along
the dimensions, such as aggregation or averaging.
• The cube metadata is typically created from a star schema or snowflake schema or fact
constellation of tables in a relational database. Measures are derived from the records in the fact
table and dimensions are derived from the dimension tables.
• Each measure can be thought of as having a set of labels, or meta-data associated with it.
A dimension is what describes these labels; it provides information about the measure.
• A simple example would be a cube that contains a store's sales as a measure, and Date/Time as
a dimension. Each Sale has a Date/Time label that describes more about that sale.
• Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by
adding a foreign key column to the fact table. This allows an analyst to view the measures along any
combination of the dimensions.

Transaction processing system
• Transaction processing is a style of computing that divides work into individual, indivisible
operations, called transactions. A transaction processing system (TPS) or transaction server is a
software system, or software/hardware combination, that supports transaction processing.
Processingtypes
Transaction processing is distinct from other computer processing models — batch processing, time-sharing,
and real-time processing
Batchprocessing
Batch processing is execution of a series of programs (jobs) on a computer without manual
intervention. Several transactions, called a batch are collected and processed at the same time. The
results of each transaction are not immediately available when the transaction is being entered ;
there is a time delay.
Real-timeprocessing
Real time systems attempt to guarantee an appropriate response to a stimulus or request quickly
enough to affect the conditions that caused the stimulus." Each transaction in real-time processing is
unique; it is not part of a group of transactions.

Transaction processing system
Time-sharing
Time sharing is the sharing of a computer system among multiple users, usually giving each
user the illusion that they have exclusive control of the system. The users may be working on
the same project or different projects, but there are usually few restrictions on the type of
work each user is doing.
Transactionprocessing
Transaction processing systems also attempt to provide predictable response times to
requests, although this is not as critical as for real-time systems. Rather than allowing the
user to run arbitrary programs as time-sharing, transaction processing allows only
predefined, structured transactions. Each transaction is usually short duration and the
processing activity for each transaction is programmed in advance.

Transaction processing system features
The following features are considered important in evaluating transaction processing systems
• Performance
Fast performance with a rapid response time is critical. Transaction processing systems are usually
measured by the number of transactions they can process in a given period of time.
• Continuous availability
The system must be available during the time period when the users are entering transactions. Many
organizations rely heavily on their TPS; a breakdown will disrupt operations or even stop the business.
• Data integrity
The system must be able to handle hardware or software problems without corrupting data. Multiple
users must be protected from attempting to change the same piece of data at the same time, for
example two operators cannot sell the same seat on an airplane.
• Ease of use
Often users of transaction processing systems are casual users. The system should be simple for them
to understand, protect them from data-entry errors as much as possible, and allow them to easily
correct their errors.
• Modular growth
The system should be capable of growth at incremental costs, rather than requiring a complete
replacement. It should be possible to add, replace, or update hardware and software components
without shutting down the system.

Concurrency: Definition
 Concurrency is the ability of a database to allow multiple users to affect multiple
transactions. This is one of the main properties that separates a database from other
forms of data storage like spreadsheets.
 The ability to offer concurrency is unique to databases. Spreadsheets or other flat file
means of storage are often compared to databases, but they differ in this one important
regard. Spreadsheets cannot offer several users the ability to view and work on the
different data in the same file, because once the first user opens the file it is locked to
other users. Other users can read the file, but may not edit data.
 The problems caused by concurrency are even more important than the ability to support
concurrent transactions. For example, when one user is changing data but has not yet
saved (committed) that data, then the database should not allow other users who query
the same data to view the changed, unsaved data. Instead the user should only view the
original data.

Concurrency Problems
Concurrency problems include:
• Lost or buried updates.
• Uncommitted dependency (dirty read).
• Inconsistent analysis (non-repeatable read).
• Phantom reads.

• Lost Updates
Lost updates occur when two or more transactions select the same row and then update the row based on
the value originally selected. Each transaction is unaware of other transactions. The last update overwrites
updates made by the other transactions, which results in lost data.
For example,-
Two editors make an electronic copy of the same document. Each editor changes the copy independently
and then saves the changed copy, thereby overwriting the original document. The editor who saves the
changed copy last overwrites changes made by the first editor. This problem could be avoided if the second
editor could not make changes until the first editor had finished.
• Uncommitted Dependency (Dirty Read)
Uncommitted dependency occurs when a second transaction selects a row that is being updated by another
transaction. The second transaction is reading data that has not been committed yet and may be changed by
the transaction updating the row.
• For example,
An editor is making changes to an electronic document. During the changes, a second editor takes a copy of
the document that includes all the changes made so far, and distributes the document to the intended
audience. The first editor then decides the changes made so far are wrong and removes the edits and saves
the document. The distributed document contains edits that no longer exist, and should be treated as if they
never existed. This problem could be avoided if no one could read the changed document until the first
editor determined that the changes were final.

• Inconsistent Analysis (Non-repeatable Read)
Inconsistent analysis occurs when a second transaction accesses the same row several times and reads
different data each time. Inconsistent analysis is similar to uncommitted dependency in that another
transaction is changing the data that a second transaction is reading. However, in inconsistent analysis,
the data read by the second transaction was committed by the transaction that made the change. Also,
inconsistent analysis involves multiple reads (two or more) of the same row and each time the
information is changed by another transaction; thus, the term non-repeatable read.
• For example, an editor reads the same document twice, but between each reading, the writer
rewrites the document. When the editor reads the document for the second time, it has changed.
The original read was not repeatable. This problem could be avoided if the editor could read the
document only after the writer has finished writing it.
• Phantom Reads
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range
of rows being read by a transaction. The transaction's first read of the range of rows shows a row that
no longer exists in the second or succeeding read, as a result of a deletion by a different transaction.
Similarly, as the result of an insert by a different transaction, the transaction's second or succeeding
read shows a row that did not exist in the original read.
• For example, an editor makes changes to a document submitted by a writer, but when the changes
are incorporated into the master copy of the document by the production department, they find that
new unedited material has been added to the document by the author. This problem could be
avoided if no one could add new material to the document until the editor and production
department finish working with the original document.

Concurrency Control
• Lock-Based Protocols
• Timestamp-Based Protocols
• Validation-Based Protocols
• Multiple Granularity
• Multiversion Schemes
• Insert and Delete Operations

Lock-Based Protocols
• A lock is a mechanism to control concurrent access to a data item
• Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
• Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.

Lock-Based Protocols (Cont.)
• Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested lock is compatible with
locks already held on the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction
holds an exclusive on the item no other transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is then
granted.

Lock-Based Protocols (Cont.)
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability — if
A and B get updated in-between the read of A and B, the
displayed sum would be wrong.
• A locking protocol is a set of rules followed by all transactions
while requesting and releasing locks. Locking protocols restrict
the set of possible schedules.

Pitfalls of Lock-Based Protocols
• Consider the partial schedule
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3
to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to
release its lock on A.
Such a situation is called a deadlock.
To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.

Pitfalls of Lock-Based Protocols (Cont.)
• The potential for deadlock exists in most locking protocols.
Deadlocks are a necessary evil.
• Starvation is also possible if concurrency control manager is
badly designed. For example:
• A transaction may be waiting for an X-lock on an item, while a
sequence of other transactions request and are granted an S-lock
on the same item.
• The same transaction is repeatedly rolled back due to deadlocks.
• Concurrency control manager can be designed to prevent
starvation

The Two-Phase Locking Protocol
• This is a protocol which ensures conflict-serializable schedules.
• Phase 1: Growing Phase
• transaction may obtain locks
• transaction may not release locks
• Phase 2: Shrinking Phase
• transaction may release locks
• transaction may not obtain locks
The protocol assures serializability. It can be proved that the
transactions can be serialized in the order of their lock points (i.e.
the point where a transaction acquired its final lock).

The Two-Phase Locking Protocol (Cont.)
• Two-phase locking does not ensure freedom from deadlocks
• Cascading roll-back is possible under two-phase locking. To avoid this,
follow a modified protocol called strict two-phase locking. Here a
transaction must hold all its exclusive locks till it commits/aborts.
• Rigorous two-phase locking is even stricter: here all locks are held till
commit/abort. In this protocol transactions can be serialized in the
order in which they commit
• There can be conflict serializable schedules that cannot be obtained if
two-phase locking is used.
• However, in the absence of extra information (e.g., ordering of access
to data), two-phase locking is needed for conflict serializability in the
following sense:
Given a transaction Ti that does not follow two-phase locking, we
can find a transaction Tj that uses two-phase locking, and a schedule
for Ti and Tj that is not conflict serializable.

Multiple Granularity
• Granularity is the extent to which a system is broken down into small
parts, either the system itself or its description or observation. It is the
extent to which a larger entity is subdivided. For example, a yard
broken into inches has finer granularity than a yard broken into feet.
• Allow data items to be of various sizes and define a hierarchy of data
granularities, where the small granularities are nested within larger
ones
• Can be represented graphically as a tree (but don't confuse with tree-locking
protocol)
• When a transaction locks a node in the tree explicitly, it implicitly locks
all the node's descendents in the same mode.

Example of Granularity Hierarchy
The levels, starting from the coarsest (top) level are
database
area
file
record

Intention Lock Modes
• In addition to S and X lock modes, there are three additional lock
modes with multiple granularity:
• intention-shared (IS): indicates explicit locking at a lower level of the tree
but only with shared locks.
• intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
• shared and intention-exclusive (SIX): the subtree rooted by that node is
locked explicitly in shared mode and explicit locking is being done at a
lower level with exclusive-mode locks.
• intention locks allow a higher level node to be locked in S or X mode
without having to check all descendent nodes.

Deadlock Handling
• A deadlock is a situation in which two or more competing actions are each
waiting for the other to finish, and thus neither ever does.
• In an operating system, a deadlock is a situation which occurs when
a process or threadenters a waiting state because a resource requested is
being held by another waiting process, which in turn is waiting for another
resource. If a process is unable to change its state indefinitely because the
resources requested by it are being used by another waiting process, then
the system is said to be in a deadlock
System is deadlocked if there is a set of transactions such that every transaction in
the set is waiting for another transaction in the set.

Deadlock Handling
• Deadlock prevention protocols ensure that the system will never
enter into a deadlock state.
• Some prevention strategies :
• Require that each transaction locks all its data items before it begins
execution (pre declaration).
• Impose partial ordering of all data items and require that a transaction
can lock data items only in the order specified by the partial order
(graph-based protocol).
• Timeout-Based Schemes :
• a transaction waits for a lock only for a specified amount of time. After
that, the wait times out and the transaction is rolled back.
• thus deadlocks are not possible
• simple to implement; but starvation* is possible. Also difficult to
determine good value of the timeout interval.
* Starvation is a severe deficiency in caloric energy, nutrient, and vitamin intake. It is
the most extreme form of malnutrition

Deadlock Detection
• After a deadlock is detected, it can be corrected by using one of
the following methods:
• Process Termination: One or more process involved in the
deadlock may be aborted. We can choose to abort all processes
involved in the deadlock. This ensures that deadlock is resolved
with certainty and speed. But the expense is high as partial
computations will be lost. Or, we can choose to abort one
process at a time until the deadlock is resolved. This approach
has high overheads because after each abort an algorithm must
determine whether the system is still in deadlock. Several
factors must be considered while choosing a candidate for
termination, such as priority and age of the process.
• Resource Preemption: Resources allocated to various processes
may be successively preempted and allocated to other
processes until the deadlock is broken.

Deadlock Detection (Cont.)
Wait-for graph without a cycle Wait-for graph with a cycle

Deadlock Recovery
• When deadlock is detected :
• Some transaction will have to rolled back (made a victim) to
break deadlock. Select that transaction as victim that will incur
minimum cost.
• Rollback -- determine how far to roll back transaction
• Total rollback: Abort the transaction and then restart it.
• More effective to roll back transaction only as far as necessary to
break deadlock.
• Starvation happens if same transaction is always chosen as
victim. Include the number of rollbacks in the cost factor to avoid
starvation

Other Approaches to Concurrency Control
• Timestamp-Based Protocols
• Each transaction is issued a timestamp when it enters the
system. If an old transaction Ti has time-stamp TS(Ti), a new
transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti)
<TS(Tj).
• The protocol manages concurrent execution such that the
time-stamps determine the serializability order.
• In order to assure such behavior, the protocol maintains for
each data Q two timestamp values:
• W-timestamp(Q) is the largest time-stamp of any transaction that
executed write(Q) successfully.
• R-timestamp(Q) is the largest time-stamp of any transaction that
executed read(Q) successfully.

Timestamp-Based Protocols
• The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order.
• Suppose a transaction Ti issues a read(Q)
1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q that was
already overwritten.
 Hence, the read operation is rejected, and Ti is rolled back.
2. If TS(Ti)W-timestamp(Q), then the read operation is executed, and R-timestamp(
Q) is set to max(R-timestamp(Q), TS(Ti)).
• Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced.
 Hence, the write operation is rejected, and Ti is rolled back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
 Hence, this write operation is rejected, and Ti is rolled back.
3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

Correctness of Timestamp-Ordering Protocol
• The timestamp-ordering protocol guarantees serializability since all
the arcs in the precedence graph are of the form:
Thus, there will be no cycles in the precedence graph
• Timestamp protocol ensures freedom from deadlock as no
transaction ever waits.
• But the schedule may not be cascade-free, and may not even be
recoverable.

DCL commands are used to enforce database security in a multiple user database
environment. Two types of DCL commands are GRANT and REVOKE. Only Database
Administrator's or owner's of the database object can provide/remove privileges on a
databaseobject
• SQL GRANT Command
SQL GRANT is a command used to provide access or privileges on the database objects
to the users
The Syntax for the GRANT command is:

• SQL GRANT Command
• privilege_name is the access right or privilege granted to the user. Some of
the access rights are ALL, EXECUTE, and SELECT.
• object_name is the name of an database object like TABLE, VIEW, STORED
PROC and SEQUENCE.
• user_name is the name of the user to whom an access right is being granted.
• user_name is the name of the user to whom an access right is being granted.
• PUBLIC is used to grant access rights to all users.
• ROLES are a set of privileges grouped together.
• WITH GRANT OPTION - allows a user to grant access rights to other users.

SQL GRANT Command
• For Example:
GRANT SELECT ON employee TO user1;This command grants a SELECT
permission on employee table to user1.You should use the WITH GRANT
option carefully because for example if you GRANT SELECT privilege on
employee table to user1 using the WITH GRANT option, then user1 can GRANT
SELECT privilege on employee table to another user, such as user2 etc. Later, if
you REVOKE the SELECT privilege on employee from user1, still user2 will have
SELECT privilege on employee table.

SQL REVOKE Command :
• The REVOKE command removes user access rights or privileges to the database objects.
• The Syntax for the REVOKE command is:
For Example:
REVOKE SELECT ON employee FROM user1;This command will REVOKE a SELECT privilege on
employee table from user1.When you REVOKE SELECT privilege on a table from a user, the
user will not be able to SELECT data from that table anymore. However, if the user has
received SELECT privileges on that table from more than one users, he/she can SELECT from
that table until everyone who granted the permission revokes it. You cannot REVOKE
privileges if they were not initially granted by you.

Privileges and Roles:
Privileges: Privileges defines the access rights provided to a user on a database
object. There are two types of privileges.
1) System privileges - This allows the user to CREATE, ALTER, or DROP database
objects.
2) Object privileges - This allows the user to EXECUTE, SELECT, INSERT, UPDATE, or
DELETE data from database objects to which the privileges apply.
Few CREATE system privileges are listed below:
System Privileges Description
CREATE object
allows users to create the specified object in their
own schema.
CREATE ANY
object
allows users to create the specified object in any
schema.

The above rules also apply for ALTER and DROP system privileges.
Few of the object privileges are listed below:
Object Privileges Description
INSERT allows users to insert rows into a table.
SELECT allows users to select data from a database object.
UPDATE allows user to update data in a table.
EXECUTE allows user to execute a stored procedure or a function.

Roles:
Roles are a collection of privileges or access rights. When there are many users in
a database it becomes difficult to grant or revoke privileges to users. Therefore, if
you define roles, you can grant or revoke privileges to users, thereby automatically
granting or revoking privileges. You can either create Roles or use the system roles
pre-defined by oracle.
Some of the privileges granted to the system roles are as given below:
System Role Privileges Granted to the Role
CONNECT
CREATE TABLE, CREATE VIEW, CREATE SYNONYM, CREATE SEQUENCE,
CREATE SESSION etc.
RESOURCE
CREATE PROCEDURE, CREATE SEQUENCE, CREATE TABLE, CREATE
TRIGGER etc. The primary usage of the RESOURCE role is to restrict
access to database objects.
DBA ALL SYSTEM PRIVILEGES

• Creating Roles:
The Syntax to create a role is:
It's easier to GRANT or REVOKE privileges to the users through a role rather
than assigning a privilege directly to every user. If a role is identified by a
password, then, when you GRANT or REVOKE privileges to the role, you
definitely have to identify it with the password.

Database Recovery
There are many situations in which a transaction may not reach a commit or
abort point. An operating system crash can terminate the DBMS processes
• The DBMS can crash
• The process issuing the transaction can abnormally terminate
• The system might lose power
• A disk may fail or other hardware may fail.
• Human error can result in deletion of critical data.
• In any of these situations, data in the database may become inconsistent or
lost.

Two general classes of failures:
• Process or Processor failures in which no data is lost but the
database must be restarted and recovered.
• Storage failure wherein data on disk is lost or corrupted.
In such cases, data must be restored from backup and then the database must be restarted and
recovered.
Database Recovery is the process of restoring the database and the data to a
consistent state. This may include restoring lost data up to the point of the event (e.g.
system crash).
There are many different approaches to recover a database. Here we discuss Manual
Reprocessing, and a variety of Automated recovery techniques.

Manual Reprocessing
• In a Manual Reprocessing approach, the database is periodically backed
up (a database save) and all transactions applied since the last save are
recorded
• If the system crashes, the latest database save is restored and all of the
transactions are re-applied (by users) to bring the database back up to
the point just before the crash.
• Several shortcomings:
• Time required to re-apply transactions
• Transactions might have other (physical) consequences
• Re-applying concurrent transactions is not straight forward.

Automated Recovery
There are several types of Automated Recovery techniques including: Deferred
update and immediate update, shadow paging, etc.
Recall, a transaction can be in one of the following states:
• Active - when the transaction just begins
• Partially Committed - after the last operation has completed (but before the
commit point is reached)
• Failed - Normal operation is prevented (e.g., in the event of a system crash)
• Aborted - Transaction is rolled back. That is, all of its effects are undone
• Committed - Transaction completes all operations and moved the database to
the next consistent state

Deferred Update Recovery
• Also called NO-UNDO/REDO During a transaction, only record the changes to data items in the log. When the
transaction commits, actually update the data items on disk. Two main rules: A transaction cannot change
any items in the database until it commits.
• A transaction may not commit until all of the write operations are successfully recorded in the log. This means
that we must check to see that the log is actually written to disk.
Since T1 and T4 committed, their changes were written to disk
However, T2 and T3 did not commit, hence their changes were not written to disk
To recover, we simply ignore those transactions that did not commit.
Advantages: Recovery is made easier:
Any transaction that reached the commit point (from the log) has its writes applied to the database (REDO).
All other transactions are ignored.
Cascading rollback does not occur because no other transaction sees the work of another until it is committed
(no stale reads).

Immediate Update Recovery
• Immediate Update applies the write operations to the database as the
transaction is executing (as opposed to at the end as with deferred
update).
Writes are still saved in the log before being applied to the database - a
write-ahead log (WAL)
• Maintain two logs:
• REDO log: A record of each new data item in the database.
• UNDO log: A record of each updated data item (old values).
• Two rules:
• Transaction T may not update the database until all UNDO entries have
been written to the UNDO log.
• Transaction T is not allowed to commit until all REDO and UNDO log entries
are written (forced-written to disk).

• To Recover:
• Begin at the end of the log and read backwards to the last chechpoint.
Create two lists:
C - transactions that have committed
NC - transactions that did not commit
• For each transaction T in C, make a third list of transactions, RB, that have
read items that were updated by NC transactions during T.
• UNDO: Undo the effects of the transactions in NC and RB.
• REDO: Apply the effects of all of the transactions that did commit (all T in
list C)
Advantages:
• Immediate update allows higher concurrency because transactions write
continuously to the database rather than waiting until the commit point.
Disadvantages:
• Step 2 above can lead to cascading rollbacks - time consuming and may be
problematic.

OLAP (online analytical processing)
• OLAP (online analytical processing) is computer processing that enables a user to easily and
selectively extract and view data from different points of view.
For example, a user can request that data be analyzed to display a spreadsheet showing all of
a company's beach ball products sold in Florida in the month of July, compare revenue
figures with those for the same products in September, and then see a comparison of other
product sales in Florida in the same time period.
To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas
a relational database can be thought of as two-dimensional, a multidimensional database
considers each data attribute (such as product, geographic sales region, and time period) as a
separate "dimension." OLAP software can locate the intersection of dimensions (all products
sold in the Eastern region above a certain price during a certain time period) and display them.
Attributes such as time periods can be broken down into sub-attributes.

OLAP (online analytical processing)
• OLAP can be used for data mining or the discovery of previously undiscerned
relationships between data items. An OLAP database does not need to be as
large as a data warehouse, since not all transactional data is needed for trend
analysis. Using Open Database Connectivity (ODBC), data can be imported from
existing relational databases to create a multidimensional database for OLAP.
• Online Analytical Processing, a category of software tools that provides analysis
of data stored in a database. OLAP tools enable users to analyze different
dimensions of multidimensional data. For example, it provides time series and
trend analysis views. OLAP often is used in data mining.
• The chief component of OLAP is the OLAP server, which sits between a client and
a database management systems (DBMS). The OLAP server understands how
data is organized in the database and has special functions for analyzing the data.
There are OLAP servers available for nearly all the major database systems.

OLTP vs. OLAP
• OLTP (On-line Transaction Processing) is characterized by a large number of short on-line
transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very
fast query processing, maintaining data integrity in multi-access environments and an
effectiveness measured by number of transactions per second. In OLTP database there is
detailed and current data, and schema used to store transactional databases is the entity
model (usually 3NF).
• OLAP (On-line Analytical Processing) is characterized by relatively low volume of
transactions. Queries are often very complex and involve aggregations. For OLAP systems a
response time is an effectiveness measure. OLAP applications are widely used by Data
Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional
schemas (usually star schema).

The following table summarized main differences between OLPT and OLAP
OLTP System
Online Transaction Processing
(Operational System)
OLAP System
Online Analytical Processing
(Data Warehouse)
Source of data
Operational data; OLTPs are the original source of the
data.
Consolidation data; OLAP data comes from the various OLTP
Databases
Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support
What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities
Inserts and Updates
Short and fast inserts and updates initiated by end
users
Periodic long-running batch jobs refresh the data
Queries
Relatively standardized and simple queries Returning
relatively few records
Often complex queries involving aggregations
Processing Speed Typically very fast
Depends on the amount of data involved; batch data refreshes
and complex queries may take many hours; query speed can be
improved by creating indexes
Space Requirements Can be relatively small if historical data is archived
Larger due to the existence of aggregation structures and history
data; requires more indexes than OLTP
Database Design Highly normalized with many tables
Typically de-normalized with fewer tables; use of star and/or
snowflake schemas
Backup and Recovery
Backup religiously; operational data is critical to run the
business, data loss is likely to entail significant
monetary loss and legal liability
Instead of regular backups, some environments may consider
simply reloading the OLTP data as a recovery method

What Is a Data Warehouse?
• A data warehouse is a relational or multidimensional database
that is designed for query and analysis. Data warehouses are not
optimized for transaction processing, which is the domain of OLTP
systems. Data warehouses usually consolidate historical and
analytic data derived from multiple sources. Data warehouses
separate analysis workload from transaction workload and enable
an organization to consolidate data from several sources.
• A data warehouse usually stores many months or years of data to
support historical analysis. The data in a data warehouse is
typically loaded through an extraction, transformation, and
loading (ETL) process from one or more data sources such as OLTP
applications, mainframe applications, or external data providers.
• Users of the data warehouse perform data analyses that are often
time-related. Examples include consolidation of last year's sales
figures, inventory analysis, and profit by product and by customer

The Key Characteristics of a Data Warehouse
• The key characteristics of a data warehouse are as follows:
• Some data is denormalized for simplification and to improve
performance
• Large amounts of historical data are used
• Queries often retrieve large amounts of data
• Both planned and ad hoc queries are common
• The data load is controlled
• In general, fast query performance with high data throughput is the key
to a successful data warehouse.

Common Oracle Data Warehousing Tasks
• As an Oracle data warehousing administrator or designer, you can expect to
be involved in the following tasks:
• Configuring an Oracle database for use as a data warehouse
• Designing data warehouses
• Performing upgrades of the database and data warehousing software to new
releases
• Managing schema objects, such as tables, indexes, and materialized views
• Managing users and security
• Developing routines used for the extraction, transformation, and loading
(ETL) processes
• Creating reports based on the data in the data warehouse
• Backing up the data warehouse and performing recovery when necessary
• Monitoring the data warehouse's performance and taking preventive or
corrective action as required
• In a small-to-midsize data warehouse environment, you might be the sole
person performing these tasks. In large, enterprise environments, the job is
often divided among several DBAs and designers, each with their own
specialty, such as database security or database tuning.

Characteristics of Data Warehouses
• Data warehouses have several distinguishing characteristics.
• These systems combine data from multiple sources. Operational systems
such as ERP systems provide production data, financial systems supply
revenue and expense data, and human resource systems present
employee data.
• The data copied into a data warehouse does not change (except to
correct errors). The data warehouse is a historical record of the state of an
organization. The frequent changes of the source OLTP systems are
reflected in the data warehouse by adding new data, not by changing
existing data.
• Data warehouses are subject oriented, that is, they focus on measuring
entities, such as sales, inventory, and quality. OLTP systems, by contrast,
are function oriented and focus on operations such as order fulfillment.
• In data warehouses, data from distinct function-oriented systems is
integrated to provide a single view of an operational entity.
• Data warehouses are designed for business users, not database
programmers, so they are easy to understand and query.
• Data warehouses are designed for speed. Users interact with data
warehouses—running queries, analyzing data, formulating new queries
based on information gleaned from earlier queries—so the systems must
be responsive.

Metadata
• Metadata is "data about data". The term is ambiguous, as it is used for two
fundamentally different concepts (types). Structural metadata is about the
design and specification of data structures and is more properly called "data
about the containers of data"; descriptive metadata, on the other hand, is about
individual instances of application data, the data content.
Metadata (metacontent) are defined as the data providing information about one
or more aspects of the data, such as:
• Means of creation of the data
• Purpose of the data
• Time and date of creation
• Creator or author of the data
• Location on a computer network where the data were created
• Standards used

Metadata types
The metadata application is many fold covering a large variety of fields of application there are nothing
but specialized and well accepted models to specify types of metadata
 Structural metadata are used to describe the structure of database objects such as tables, columns,
keys and indexes.
 Guide metadata are used to help humans find specific items and are usually expressed as a set of
keywords in a natural language
 Technical metadata correspond to internal metadata, business metadata - to external metadata.
Kimball adds a third category named
 Process metadata. On the other hand, NISO distinguishes among three types of metadata:
descriptive, structural and administrative.[
 Descriptive metadata are the information used to search and locate an object such as title, author,
subjects, keywords, publisher;
 Structural metadata give a description of how the components of the object are organised
 Administrative metadata refer to the technical information including file type. Two sub-types of
administrative metadata are rights management metadata and preservation metadata.

OLAP Metadata Model
• The basic data model in a relational database is a table composed of one
or more columns of data. All of the data is stored in columns. In contrast,
the basic data model for multidimensional analysis is a cube, which is
composed of Measures, Dimensions, and Attributes.
• The OLAP Catalog metadata informs applications about the data that is
available within the database. The application can then define
multidimensional objects to represent that data. When the application
runs, it instantiates these objects and populates them with data.

Mapping Data Objects to Metadata Objects
The objects comprising a data warehouse and Oracle OLAP metadata use different data
structures. The data objects in your data warehouse are represented to the OLAP
metadata catalog in the following relational objects, regardless of whether the data is
actually stored in relational tables or workspace variables:
• Fact Tables or Views
• Level-based dimension Tables or Views
Oracle OLAP metadata catalog maps the data warehouse schema to these
multidimensional data objects:
• Measures
• Dimensions
• Dimension attributes
• Levels
• Level attributes
• Hierarchies
• Cubes
• Measure folders

Measures
 Measures are the same as facts. The term "fact" is typically used in
relational databases, and the term "measure" is typically used in
multidimensional applications.
 Measures are thus located in fact tables. A fact table has columns that
store measures (or facts) and foreign key columns that create the
association with dimension tables.
 Measures contain the data that you wish to analyze, such as Sales or
Cost. OLAP Catalog metadata requires that a column have a numerical or
a date data type to be identified as a measure. Most frequently, a
measure is numerical and additive.

Dimensions
• Dimensions identify and categorize your data. Dimension members are stored in a
dimension table. Each column represents a particular level in a hierarchy. In a star
schema, the columns are all in the same table; in a snowflake schema, the columns
are in separate tables for each level.
• Because measures are typically multidimensional, a single value in a measure must
be qualified by a member of each dimension to be meaningful.
For example theunit_cost measure has two dimensions:
products_dim and times_dim.
A value of unit_cost (21.60) is only meaningful when it is qualified by a specific
product code (1575) and a time period (28-jan-1998).

Attributes
Attributes provide descriptive information about the data and are typically
used for display.
Level Attributes
Level attributes provide supplementary information about the dimension
members at a particular level of a dimension hierarchy. The dimension
members themselves may be meaningless, such as a value of "1296" for a
time period. These cryptic values for dimension members are used internally
for selecting and sorting quickly, but are meaningless to users.
For example-
You might have columns for employee number (ENUM), last name
(LAST_NAME), first name (FIRST_NAME), and telephone extension
(TELNO). ENUM is the best choice for a level column, since it is a key column
and its values uniquely identify the employees. ENUM also has
a NUMBER data type, which makes it more efficient than a text column for the
creation of indexes. LAST_NAME, FIRST_NAME, and TELNO are attributes.
Even though they are dimensioned by ENUM, they do not make suitable
measures because they are descriptive text rather than business
measurements

Dimension Attributes
• Dimension attributes specify groupings of level attributes for a
specific dimension. Whereas level attributes map to specific data
values, dimension attributes are purely logical metadata objects.
• An example of a dimension attribute is end date, which is
required for time dimensions. If a time dimension has month,
quarter, and year levels, end date identifies the last date of each
month, each quarter, and each year. Within a relational schema,
the three level attributes that make up the end date dimension
attribute would be stored in columns with names
like month_end_date, quarter_end_date, and year_end_date.

Measure Folders
• Measures can be organized within measure folders, which
facilitate the browsing of data by business area. Measure folders
are also known as catalogs.
• Whereas dimensions and measures are associated with the
schemas that contain their source data, measure folders are
schema independent. Each OLAP client can view all measure
folders defined within the Oracle instance.

ETL -Extract, Transform, Load
• ETL is short for extract, transform, load
Three database functions that are combined into one tool to pull
data out of one database and place it into another database.
• Extract is the process of reading data from a database.
• Transform is the process of converting the extracted data from
its previous form into the form it needs to be in so that it can be
placed into another database. Transformation occurs by using
rules or lookup tables or by combining the data with other data.
• Load is the process of writing the data into the target database.
• ETL is used to migrate data from one database to another, to
form data marts and data warehouses and also to convert
databases from one format or type to another.

ETL - Extract, Transform, Load
• In managing databases, extract, transform, load (ETL) refers to
three separate functions combined into a single programming
tool. First, the extract function reads data from a specified
source database and extracts a desired subset of data. Next, the
transform function works with the acquired data - using rules or
lookup tables, or creating combinations with other data - to
convert it to the desired state. Finally, the load function is used to
write the resulting data (either all of the subset or just the
changes) to a target database, which may or may not previously
exist.
• ETL can be used to acquire a temporary subset of data for reports
or other purposes, or a more permanent data set may be acquired
for other purposes such as: the population of a data mart or data
warehouse; conversion from one database type to another; and
the migration of data from one database or platform to another

Dimensional modeling
• Dimensional modeling (DM) is the name of a set of techniques and concepts used
in data warehouse design. It is considered to be different from entity-relationship
modeling (ER). Dimensional Modeling does not necessarily involve a relational database.
The same modeling approach, at the logical level, can be used for any physical form,
such as multidimensional database or even flat files.
• DM is a design technique for databases intended to support end-user queries in a data
warehouse. It is oriented around understandability and performance
• ER is very useful for the transaction capture, it should be avoided for end-user delivery.
• Dimensional modeling always uses the concepts of facts (measures), and dimensions
(context). Facts are typically (but not always) numeric values that can be aggregated, and
dimensions are groups of hierarchies and descriptors that define the facts. For example,
sales amount is a fact; timestamp, product, register#, store#, etc. are elements of
dimensions.
• Dimensional models are built by business process area, e.g. store sales, inventory,
claims, etc. Because the different business process areas share some but not all
dimensions, efficiency in design, operation, and consistency, is achieved
using conformed dimensions, i.e. using one copy of the shared dimension across subject
areas

Dimensional modeling process
• The dimensional model is built on a star-like schema, with dimensions
surrounding the fact table.
To build the schema, the following design model is used:
• Choose the business process
• Declare the grain
• Identify the dimensions
• Identify the fact
Choose the business process –
The process of dimensional modeling builds on a 4-step design method that
helps to ensure the usability of the dimensional model and the use of the data
warehouse. The basics in the design build on the actual business process which
the data warehouse should cover. Therefore the first step in the model is to
describe the business process which the model builds on. This could for instance
be a sales situation in a retail store. To describe the business process, one can
choose to do this in plain text or use basic Business Process Modeling Notation
(BPMN) or other design guides like the Unified Modeling Language (UML).

• Declare the grain
After describing the Business Process, the next step in the design is to declare the
grain of the model. The grain of the model is the exact description of what the
dimensional model should be focusing on. This could for instance be “An individual
line item on a customer slip from a retail store”. To clarify what the grain means,
you should pick the central process and describe it with one sentence.
Furthermore the grain (sentence) is what you are going to build your dimensions
and fact table from. You might find it necessary to go back to this step to alter the
grain due to new information gained on what your model is supposed to be able
to deliver.
• Identify the dimensions
The third step in the design process is to define the dimensions of the model. The
dimensions must be defined within the grain from the second step of the 4-step
process. Dimensions are the foundation of the fact table, and is where the data for
the fact table is collected. Typically dimensions are nouns like date, store,
inventory etc. These dimensions are where all the data is stored. For example, the
date dimension could contain data such as year, month and weekday.

• Identify the facts
After defining the dimensions, the next step in the process is to make keys for
the fact table. This step is to identify the numeric facts that will populate each
fact table row. This step is closely related to the business users of the system,
since this is where they get access to data stored in the data warehouse.
Therefore most of the fact table rows are numerical, additive figures such as
quantity or cost per unit, etc.

Benefits of the dimensional modeling are following:
• Understandability - Compared to the normalized model, the dimensional model is easier to
understand and more intuitive. In dimensional models, information is grouped into coherent
business categories or dimensions, making it easier to read and interpret. Simplicity also allows
software to navigate databases efficiently. In normalized models, data is divided into many discrete
entities and even a simple business process might result in dozens of tables joined together in a
complex way.
• Query performance - Dimensional models are more denormalized and optimized for data
querying, while normalized models seek to eliminate data redundancies and are optimized for
transaction loading and updating. The predictable framework of a dimensional model allows the
database to make strong assumptions about the data that aid in performance. Each dimension is an
equivalent entry point into the fact table, and this symmetrical structure allows effective handling
of complex queries. Query optimization for star join databases is simple, predictable, and
controllable.
• Extensibility - Dimensional models are extensible and easily accommodate unexpected new
data. Existing tables can be changed in place either by simply adding new data rows into the table
or executing SQL alter table commands. No queries or other applications that sit on top of the
Warehouse need to be reprogrammed to accommodate changes. Old queries and applications
continue to run without yielding different results. But in normalized models each modification
should be considered carefully, because of the complex dependencies between database tables.

Fact Tables, Dimension Tables
• In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is
located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where
multiple fact tables are used, these are arranged as a fact constellation schema.
• A fact table typically has two types of columns: those that contain facts and those that are foreign keys to
dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys. Fact tables contain the content of the data warehouse and store different types of measures
like additive, non additive, and semi additive measures.
• Fact tables provide the (usually) additive values that act as independent variables by which dimensional
attributes are analyzed.
• Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by
which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by
Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store.
Other dimensions might be members of this fact table (such as location/region) but these add nothing to
the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the
independent facts but generally provide insights at a higher level of aggregation (a region contains many
stores).

Types of fact tables
There are basically three fundamental measurement events, which characterizes all fact tables.
• Transactional
A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is
usually specified as "one row per line in a transaction", e.g., every line on a receipt. Typically a transactional
fact table holds data of the most detailed level, causing it to have a great number of dimensions associated
with it.
• Periodic snapshots
The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any
defined period of time, e.g. a performance summary of a salesman over the previous month. A periodic
snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional
fact table in order to deliver the chosen performance output.
• Accumulating snapshots
This type of fact table is used to show the activity of a process that has a well-defined beginning and end, e.g.,
the processing of an order. An order moves through specific steps until it is fully processed. As steps towards
fulfilling the order are completed, the associated row in the fact table is updated. An accumulating snapshot
table often has multiple date columns, each representing a milestone in the process. Therefore, it's important
to have an entry in the associated date dimension that represents an unknown date, as many of the milestone
dates are unknown at the time of the creation of the row.

Dimension table
In data warehousing, a dimension table is one of the set of companion
tables to a fact table.
The fact table contains business facts (or measures), and foreign keys which
refer to candidate keys (normally primary keys) in the dimension tables.
Contrary to fact tables, dimension tables contain descriptive attributes (or
fields) that are typically textual fields (or discrete numbers that behave like
text). These attributes are designed to serve two critical purposes: query
constraining and/or filtering, and query result set labeling.
Dimension attributes should be:
• Verbose (labels consisting of full words)
• Descriptive
• Complete (having no missing values)
• Discretely valued (having only one value per dimension table row)
• Quality assured (having no misspellings or impossible values)

• Dimension table rows are uniquely identified by a single key field. It is
recommended that the key field be a simple integer because a key
value is meaningless, used only for joining fields between the fact and
dimension tables.
The use of surrogate dimension keys brings several advantages,
including
• Performance. Join processing is made much more efficient by using a
single field (the surrogate key)
• Buffering from operational key management practices. This prevents
situations where removed data rows might reappear when their
natural keys get reused or reassigned after a long period of dormancy
• Mapping to integrate disparate sources
• Handling unknown or not-applicable connections
• Tracking changes in dimension attribute values

The goal of a dimension table is to create standardized, conformed
dimensions that can be shared across the enterprise's data
warehouse environment, and enable joining to multiple fact tables
representing various business processes.
Conformed dimensions are important to the enterprise nature of DW/BI systems because
they promote:
• Consistency- Every fact table is filtered consistently, so that query
answers are labeled consistently.
• Integration-Queries can drill into different process fact tables
separately for each individual fact table, then join the results on
common dimension attributes.
• Reduced development time to market. The common dimensions
are available without recreating them

Introduction to Cubes
• Cubes are the main objects in online analytic processing (OLAP), a technology that
provides fast access to data in a data warehouse. A cube is a set of data that is
usually constructed from a subset of a data warehouse and is organized and
summarized into a multidimensional structure defined by a set
of dimensions and measures.
• A cube provides an easy-to-use mechanism for querying data with quick and
uniform response times. End users use client applications to connect to an Analysis
server and query the cubes on the server. In most client applications, end users
issue a query on a cube by manipulating the user interface controls, which
determine the contents of the query. This spares end users from writing language-based
queries . Precalculated summary data called aggregations provides the
mechanism for rapid and uniform response times to queries. Aggregations are
created for a cube before end users access it. The results of a query are retrieved
from the aggregations, the cube's source data in the data warehouse, a copy of
this data on the Analysis server, the client cache, or a combination of these
sources. An Analysis server can support many different cubes, such as a cube for
sales, a cube for inventory, a cube for customers, and so on.

Introduction to Cubes
• Every cube has a schema, which is the set of joined tables in the data warehouse
from which the cube draws its source data. The central table in the schema is the
fact table, the source of the cube's measures. The other tables are dimension
tables, the sources of the cube's dimensions. For more information about
schemas, see Cube Structure.
• A cube is defined by the measures and dimensions that it contains. For example, a
cube for sales analysis includes the measures Item_Sale_Price and Item_Cost and
the dimensions Store_Location, Product_Line, and Fiscal_Year. This cube enables
end users to separate Item_Sale_Price and Item_Cost into various categories by
Store_Location, Product_Line, and Fiscal_Year.
• Each cube dimension can contain a hierarchy of levels to specify the categorical
breakdown available to end users. For example, the Store_Location dimension
includes the level hierarchy: Continent, Country, State_Province, City,
Store_Number. Each level in a dimension is of finer granularity than its parent. For
example, continents contain countries or regions, and states or provinces contain
cities. Similarly, the hierarchy of the Fiscal_Year dimension includes the levels Year,
Quarter, Month, and Day.

OLAP cube
• An OLAP cube is a multidimensional database that is optimized for data
warehouse and online analytical processing (OLAP) applications.
• An OLAP cube is a method of storing data in a multidimensional form,
generally for reporting purposes. In OLAP cubes, data (measures) are
categorized by dimensions. OLAP cubes are often pre-summarized across
dimensions to drastically improve query time over relational databases.
• Although it stores data like a traditional database does, an OLAP cube is
structured very differently. Databases, historically, are designed according
to the requirements of the IT systems that use them. OLAP cubes, however,
are used by business users for advanced analytics. Thus, OLAP cubes are
designed using business logic and understanding. They are optimized for
analytical purposes, so that they can report on millions of records at a time.
Business users can query OLAP cubes using plain English.

Dimension Hierarchies
• A hierarchy defines relationships among a set of attributes that are grouped
by levels in the dimension of a cube model. These relationships between
levels are usually defined with a functional dependency to improve
optimization. Multiple hierarchies can be defined for a dimension of a cube
model.
• Hierarchies document the relationships between levels in a dimension. When
you define a hierarchy, DB2 Data Warehouse Edition creates a functional
dependency by default between consecutive levels, such that the level key
attributes from one level are functionally dependent on the level key
attributes in the lower level. For example, in a Region hierarchy that is
defined with the following levels: Region, State, and City, DB2 Data
Warehouse Edition creates two functional dependencies. One functional
dependency to show that City determines State, and a second functional
dependency to show that state determines region

All level for the hierarchies in the following types of dimensions:
Time dimensions-
Aggregating your fact data across all of time is rarely useful, unless
you know that the fact table contains data for a specific period of
time, such as the year 2005.
Scenario dimensions.
A scenario dimension describes likely ways of analyzing your data.
You might have two members, such as Actual and Budget, in your
scenario dimension. In this case, it does not make sense to combine
the data values for budget and actual into a combined aggregation.
The hierarchy type describes the relationship among the levels
within the hierarchy. The following four hierarchy types are
supported:

Balanced
• A hierarchy with meaningful levels and branches that have a consistent depth.
Each level's logical parent is in the level directly above it. A balanced hierarchy can
represent time where the meaning and depth of each level, such as Year, Quarter,
and Month, is consistent. They are consistent because each level represents the
same type of information, and each level is logically equivalent. Figure 1 shows an
example of a balanced time hierarchy.
• Figure 1. Balanced hierarchy. Example of a balanced hierarchy

Unbalanced
• A hierarchy with levels that have a consistent parent-child relationship but have a logically
inconsistent levels. The hierarchy branches also can have inconsistent depths. An unbalanced
hierarchy can represent an organization chart. For example, Figure 2 shows a chief executive
officer (CEO) on the top level of the hierarchy and at least two of the people that might
branch off below including the chief operating officer and the executive secretary. The chief
operating officer has more people branching off also, but the executive secretary does not.
The parent-child relationships on both branches of the hierarchy are consistent. However,
the levels of both branches are not logical equivalents.
For example, an executive secretary is not the logical equivalent of a chief operating
officer.
Figure 2. Unbalanced hierarchy. Example of an unbalanced hierarchy

Ragged
A hierarchy in which each level has a consistent meaning, but the branches have inconsistent
depths because at least one member attribute in a branch level is unpopulated. A ragged
hierarchy can represent a geographic hierarchy in which the meaning of each level such as city or
country is used consistently, but the depth of the hierarchy varies. Figure 3 shows a geographic
hierarchy that has Continent, Country, Province/State, and City levels defined. One branch has
North America as the Continent, United States as the Country, California as the Province or State,
and San Francisco as the City. However, the hierarchy becomes ragged when one member does
not have an entry at all of the levels. For example, another branch has Europe as the Continent,
Greece as the Country, and Athens as the City, but has no entry for the Province or State level
because this level is not applicable to Greece for the business model in this example. In this
example, the Greece and United States branches descend
Figure 3. Ragged hierarchy. Example of a ragged hierarchy to different depths, creating a ragged
hierarchy.

Network
A hierarchy in which the order of levels is not specified, but in which levels do have semantic
meaning. For example,Figure 4 shows a network hierarchy that describes product attributes
such as Color, Size, and PackageType. Because the levels do not have an inherent parent-child
relationship, the order of the levels is not important. A widget company might have member
entries like white for Color, small for Size, and shrink wrap for PackageType. A second member
entry might be red for Color, large for Size, and box for PackageType.
Figure 4. Network hierarchy. Example of a network hierarchy

Advantages are there from developing a Data Warehouse? The three major advantages
are:
1. Integrating data from multiple sources;
2. Performing new types of analyses; and
3. Reducing cost to access historical data.
Other benefits may include:
1. Standardizing data across the organization, a "single version
of the truth";
2. Improving turnaround time for analysis and reporting;
3. Sharing data and allowing others to easily access data;
4. Supporting ad hoc reporting and inquiry;
5. Reducing the development burden on IS/IT; and
6. Removing informational processing load from transaction-oriented
databases;

Advantage of data marts.
It is easy to access frequently needed data from the database when
required by the client.
We can give access to group of users to view the Datamart when it
is required. Of course performance will be good.
It is easy to maintain and to create the datamart. It will be related
to specific business.
And It is low cost to create a datamart rather than creating
datarehouse with a huge space.

Difference between Data Warehouse and Data mart
• A data mart is a simple form of a data warehouse that is focused on a single
subject (or functional area), such as Sales or Finance or Marketing. Data
marts are often built and controlled by a single department within an
organization. Given their single-subject focus, data marts usually draw data
from only a few sources. The sources could be internal operational systems,
a central data warehouse, or external data.
How Is It Different from a DataWarehouse?
• A data warehouse, in contrast, deals with multiple subject areas and is
typically implemented and controlled by a central organizational unit such
as the Corporate Information Technology (IT) group. Often, it is called a
central or enterprise data warehouse. Typically, a data warehouse
assembles data from multiple source systems.
• Nothing in these basic definitions limits the size of a data mart or the
complexity of the decision-support data that it contains. Nevertheless, data
marts are typically smaller and less complex than data warehouses; hence,
they are typically easier to build and maintain. The following table
summarizes the basic differences between a data warehouse and a data
mart:

Difference between Data Warehouse and Data mart
Parameters Data Warehouse Data Mart
Scope Corporate Line-of-Business (LoB)
Subjects Multiple Single Subject
Data Sources Many Few
Size (typical) 100 GB-TB+ < 100GB
Implementation
Time
Months to years Months

DATA WAREHOUSE REPORTING TOOLS
REPORTING TOOLS
• connect to one or multiple databases in a single report
• defining queries on the data warehouse database
• loading millions of records for aggregation and analysis (big data)
• creating simple and complex filters and sorts
• create a pixel perfect report including a variety of formatting options
• conditional formatting
• automatic summarization of the measures: average, median, count, sum and so
forth.
• display pivot tables, simple lists or summarized tables
• creating sections guided by a dimension
• presenting data in many type of graphs

DATA WAREHOUSE REPORTING TOOLS
ANALYSIS TOOLS
• Clicking on data items to drill-down into the details
• one click graphs
• slicing and dicing – switching easily dimensions
• nesting dimensions
• sorting and filtering
• conditional formatting
• extreme performance getting answers instantly
• automatic summarization of the measures

Query and Reporting Tools
The data warehouse is accessed using an end-user query and reporting tool from Business
Objects. Business Objects provides several tools to securely access the data warehouse or
personal data files with a point-and-click interface including the following:
• BusinessObjects -(Reporter and Explorer) ? a Microsoft Windows based query and
reporting tool.
• InfoView - a web based tool, that allows reports to be refreshed on demand (but
can not create new reports).
• InfoBurst - a web based server tool, that allows reports to be refreshed, scheduled
and distributed. It can be used to distribute reports and data to users or servers in
various formats (e.g. Text, Excel, PDF, HTML, etc)
• Data Warehouse List Upload - a web based tool, that allows lists of data to be
uploaded into the data warehouse for use as input to queries.

Copy of sec d (2)

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Copy of sec d (2)

Similar to Copy of sec d (2) (20)

More from Vaibhav Kathuria

More from Vaibhav Kathuria (6)

Copy of sec d (2)