The document discusses online analytical processing (OLAP) and the need for OLAP capabilities beyond basic data analysis. It describes how OLAP uses multidimensional data models and pre-computed aggregates to provide fast and interactive analysis of data across multiple dimensions. Different approaches for implementing OLAP like ROLAP, MOLAP, and hybrid systems are covered.
2. OBJECTIVES
๏ What is OLAP
๏ Need for OLAP
๏ Features & functions of OLAP
๏ Different OLAP models
๏ OLAP implementations
2
3. DEMAND FOR OLAP
๏ To develop DM, three approaches
๏ In all approaches, Data Marts rest
on Dimensional Model
๏ Data Marts are sufficient for basic
data analysis
๏ Users need to go beyond such
basic analysis
3
4. DEMAND FOR OLAP
๏ Need for Multidimensional Analysis
๏ Fast Access & Powerful
Calculations
๏ Limitations of other analysis
methods like:
๏ SQL
๏ Spreadsheets
๏ Report Writers
4
5. DEMAND FOR OLAP
๏ Traditional tools of report writers,
query products, spreadsheets, &
language interfaces do not match the
user expectations as far as
performing multidimensional analysis
with complex calculations is
concerned.
๏ Tools used with OLTP and basic DW
environments do not match up to the
task
5
6. OLAP IS THE ANSWER!
OLAP is a category of software technology
that enables analysts, managers, and
executives to gain insight into the data
through fast, consistent, interactive, access in
a wide variety of possible views of information
that has been transformed from raw data to
reflect the real dimensionality of the
enterprise as understood by the user.
6
7. 7
Why is OLAP useful?
๏ก Facilitates multidimensional data
analysis by pre-computing
aggregates across many sets of
dimensions
๏ก Provides for:
๏ฌ Greater speed and responsiveness
๏ฌ Improved user interactivity
8. DATA WAREHOUSES
๏ A data warehouse is based on a
multidimensional data model which views data
in the form of a data cube
๏ A data cube allows data to be modeled and
viewed in multiple dimensions
๏ In data warehousing literature, an n-D base cube
is called a base cuboid. The top most 0-D
cuboid, which holds the highest-level of
summarization, is called the apex cuboid. The
lattice of cuboids forms a data cube.
8
9. LATTICE OF CUBOIDS
9
all
time item location supplier
time,item time,location
item,location
time,supplier
item,supplier
location,supplier
time,item,location
time,location,supplier
time,item,supplier
item,location,supplier
time, item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
11. AGGREGATES
11
โข Add up amounts for day 1
โข In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
81
12. AGGREGATES
12
โข Add up amounts by day
โข In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
ans date sum
1 81
2 48
13. ๏ Operators: sum, count, max, min, median,
avg
๏ โHavingโ clause
๏ Using dimension hierarchy
๏ average by region (within store)
๏ maximum by month (within date)
13
Aggregates
17. AGGREGATION
USING HIERARCHIES
17
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
region A region B
p1 56 54
p2 11 8
customer
region
country
(customer c1 in Region A;
customers c2, c3 in Region B)
19. CUBE AGGREGATES LATTICE
19
all
city product date
c1 c2 c3
p1 67 12 50
city, product city, date product, date
city, product, date
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
c1 c2 c3
p1 56 4 50
p2 11 8
day 1
129
use greedy
algorithm to
decide what
to materialize
21. DIMENSION HIERARCHIES
21
all
city product date
city, product
city, date product, date
city, product, date
state, date
state, product, date
state
state, product
not all arcs shown...
22. INTERESTING HIERARCHY
22
all
years
quarters
months
weeks
days
time day week month quarter year
1 1 1 1 2000
2 1 1 1 2000
3 1 1 1 2000
4 1 1 1 2000
5 1 1 1 2000
6 1 1 1 2000
7 1 1 1 2000
8 2 1 1 2000
conceptual
dimension table
23. SAMPLE CUBE
23
Total annual sales
of TV in U.S.A. Date
Total annual sales
of PC in U.S.A.
Total annual sales
Total Q1 sales of VCR in U.S.A.
In U.S.A
Total Q1 sales
In Canada
Total Q1 sales
In Mexico
Country
sum
sum
TV
PC
VCR
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Total Q2 sales
In all countries
Total Q1 sales
In all countries
Total sales
In U.S.A
Total sales
In Canada
Total sales
In Mexico
TOTAL SALES
30. OTHER OLAP
OPERATIONS
o Drill-Across: Queries involving more than one fact table
o Drill-Through: Makes use of SQL to drill through the
bottom level of a data cube down to its back-end relational
tables
o Pivot (rotate): Pivot (also called "rotate") is a
visualization operation which rotates the data axes in
view in order to provide an alternative presentation of
the data. Other examples include rotating the axes in a
3-D cube, or transforming a 3-D cube into a series of 2-
D planes.
30
31. OTHER OLAP
OPERATIONS
31
o Moving Averages
o Growth Rates
o Depreciation
o Currency Conversion
o Statistical Functions
o Top N or Bottom N queries
32. 32
Conceptual vs. Actual
๏ก The โcubeโ is a logical way of
visualizing the data in an OLAP
setting
๏ก Not how the data is actually
represented on disk
๏ก Two ways of storing data:
๏ฌ ROLAP: Relational OLAP
๏ฌ MOLAP: Multidimensional OLAP
33. ๏ Construction of the data cube is key
to the operation of OLAP
๏ The computation process creates a
set of aggregates on the various
dimensions of the data
๏ The CUBE operator
33
OLAP & CUBE
35. 35
The CUBE Operator
๏ก Proposed by Gray et al*
๏ก Effectively involves a series of
GROUP-BY operations to
aggregate data
๏ก Creates power set on all
attributes according to:
๏ฌ A measure
๏ฌ An aggregator function
*J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.
Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals.
Data Mining and Knowledge Discovery, 1:29-54, 1997.
36. ๏ Problem: this generates a lot of data
and work (2n sets in total, where n is
the number of dimensions)
๏ Solution: optimized algorithms to run
faster, consume less memory, and
perform fewer I/Os.
36
CUBING Problem
37. o ROLAP-based cubing algorithms
(Agarwal et alโ96)
o Array-based cubing algorithm
(Zhao et alโ97)
37
Efficient Computation
of Data Cubes
S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta, J. F. Naughton, R.
Ramakrishnan and S.Sarawagi.
On the computation of multidimensional aggregates. In VLDB'96.
Y. Zhao, P. M. Deshpande, and J. F. Naughton.
An array-based algorithm for simultaneous multidimensional aggregates.
In SIGMOD'97.
38. o How many cuboids in a cube with 3 dimensions?
o Answer:
o As many group by operations?
o No hierarchies involved!!
o ฯ (Li +1), where Li is the number of levels
associated with dimension I
o 10 dimensions & 4 levels for each
dimension
o Total Cuboids = 510
38
Efficient Computation
of Data Cubes
39. ๏ It is all about which DBMS you choose
to store your data warehouse data
๏ RDBMS โ ROLAP
๏ MDDB โ MOLAP
๏ BOTH - HOLAP
39
Approaches to OLAP
Servers
40. Three possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
๏ Relational and specialized relational DBMS to store and
manage warehouse data
๏ OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
๏ Array-based storage structures
๏ Direct access to array data structures
(3) Hybrid OLAP (HOLAP)
๏ Storing detailed data in RDBMS
๏ Storing aggregated data in MDBMS
๏ User access via MOLAP tools
40
Approaches to OLAP
Servers
41. ๏ Special schema design: star, snowflake
๏ Special indexes: bitmap, multi-table join
๏ Proven technology (relational model, DBMS), tend
to outperform specialized MDDB especially on
large data sets
๏ Products
๏ IBM DB2, Oracle, Sybase IQ, RedBrick,
Informix
41
ROLAP
42. ๏ Defines complex, multi-dimensional data with
simple model
๏ Reduces the number of joins a query has to
process
๏ Allows the data warehouse to evolve with
relatively low maintenance
๏ Can contain both detailed and summarized data.
๏ ROLAP is based on familiar, proven, and already
selected technologies.
BUT!!!
๏ SQL for multi-dimensional manipulation of
calculations.
42
ROLAP
43. ๏ MDDB: a special-purpose data model
๏ Facts stored in multi-dimensional
arrays
๏ Dimensions used to index array
๏ Sometimes on top of relational DB
๏ Products
๏ Pilot, Arbor Essbase, Gentia
43
MOLAP
44. ๏ Pre-calculating or pre-consolidating transactional data
improves speed.
BUT
Fully pre-consolidating incoming data, MDDs require an
enormous amount of overhead both in processing time and
in storage. An input file of 200MB can easily expand to 5GB
MDDBs are great candidates for the < 100GB department
data marts.
๏ With MDDs, application design is essentially the definition of
dimensions and calculation rules, while the RDBMS
requires that the database schema be a star or snowflake.
44
MOLAP
45. ๏ User Needs
๏ Multidimensional view
๏ Excellent Performance
๏ Analytical Flexibility
๏ Real-Time Data Access
๏ High Data Capacity
๏ MIS Needs
๏ Leverages Data Warehouse
๏ Easy Development
๏ Low Structure Maintenance
๏ Low Aggregate Maintenance
45
OLAP Needs
46. Multidimensional View
๏ All true OLAP tools, whether they work with a
MDDB or an RDBMS, provide a
multidimensional view of data.
๏ For example, decision makers may view
sales by office, quarter, representative,
product, etc. This perspective on data, which
mirrors the way business professional think,
allows for more intuitive and more powerful
analysis.
46
OLAP Needs: User
Needs
47. Excellent Performance
๏ The performance of your decision support
tool directly depends on the way it
manages aggregates.
๏ RDBMS
๏Calculate aggregates on fly (response time
suffers)
๏DBA creates summary tables to store
aggregates (enormous amount of disk space)
47
OLAP Needs: User
Needs
48. Excellent Performance
๏ For example, suppose you have a Sales indicator with
six dimensionsโRepresentatives, Products, Customers,
Regions, Months, and Years.
๏ MOLAP tools will store a given aggregate, such as the
November 1997 government sales of product A504 by
representative 1040 in New York, in 1 cell of the MDDB.
๏ In contrast, ROLAP tools consume 600% more space,
because they require a record of seven valuesโsix
foreign keys and the actual aggregateโin a relational
summary table.
48
OLAP Needs: User
Needs
50. Excellent Performance
50
OLAP Needs: User
Needs
RDBMSs must use several summary tables to store the aggregates
that a MOLAP could store in just one cube. For example, consider a Sales
indicator with three dimensions: Months, Regions, and Products. The
indicator cube will contain seven sets of aggregates:
โข Sales by month
โข Sales by product
โข Sales by region
โข Sales by month and product
โข Sales by month and region
โข Sales by product and region
โข Sales by product, month, and region
To store these aggregates in an RDBMS, youโd have to create seven
summary tables, one for each aggregate set.
HOW MANY SUMMARY TABLES FOR 6 DIMENSIONS?
(Separate fact table and shrunken dimension table approach for storing
aggregates)
51. Excellent Performance
51
OLAP Needs: User
Needs
โข Huge amounts of extra storage space is required (even if
there is no sparsity failure)
โข Maintenance costs are high
โข Lot of statistical analysis needs to be done to decide which
aggregates are to be precomputed
โข DBA must keep the cost/performance ratio in check
52. Excellent Performance
52
OLAP Needs: User
Needs
โข In contrast, weโve seen that multidimensional databases
store aggregates in a very compact structure that consumes
very little disk space and requires very little maintenance
โข All levels of consolidation can therefore be precomputed
and stored in MDDB
โข As a result, fast response time is not limited to the most frequently
accessed queries; all aggregates can be accessed with lightning
speed.
53. Analytical Flexibility
๏ Both ROLAP & MOLAP tools offer comparative
performance for
๏ Comparative Analysis
๏ Roll-up and Drill-down
๏ Slicing & Dicing
๏ Only MOLAP tools offer โwhat-ifโ analysis
53
OLAP Needs: User
Needs
54. Real-Time Data Access
๏ MOLAP tools load data into the multidimensional cubes.
Consequently, the data being accessed is only as recent as the
last load.
๏ Some applications require real-time data access
๏ Process of continually refreshing the data attaches higher costs
to operating a MOLAP system
๏ Some MOLAP tools offer reach-through functionality to access
volatile data stored outside the MDDB
๏ Unfortunately, users must be aware of the underlying database
structure
๏ Relational data access is too complex for the typical user
54
OLAP Needs: User
Needs
55. Real-Time Data Access
๏ ROLAP tools maintain a constant link to the
operational RDBMS, which provides users with up-to-
the-minute, accurate data
(Real-Time Data Warehousing)
๏ Industries & organizations with highly volatile data
particularly benefit from this access to live,
operational data.
55
OLAP Needs: User
Needs
56. High Capacity Data
๏ MOLAP products are limited by the size of the cube
defined by the multidimensional view. When
dimension elements are predefined, the scope of
available data is limited at the onset.
๏ ROLAP tools circumvent this barrier. Dynamic
dimensions are not stored in the predefined
multidimensional model, but fetched at run time from
the RDBMS.
56
OLAP Needs: User
Needs
57. High Capacity Data
57
OLAP Needs: User
Needs
o In MOLAP, only aggregates are stored in the cube.
Atomic, operational data are forced out of the userโs
analytical realm.
o ROLAP systems can access extremely detailed
operational data, as well as aggregated data stored in
summary tables.
58. MIS Needs
Administrators should be able to
leverage their existing relational
databases without devoting large
amounts of time and effort to intricate
development, fine tuning, or intensive
maintenance.
58
OLAP Needs
59. Leveraging Data Warehouse
๏ Both the finance and the MIS departments of your
organization will appreciate a decision support tool
that leverages existing investments in data
warehousing.
๏ MIS staff that opts for a MOLAP tool must duplicate
data in its own proprietary MDDB.
๏ MIS staff that chooses a ROLAP tool will be able to
access the data warehouse directly.
59
OLAP Needs: MIS
Needs
60. Easy Development
๏ MOLAP development is straightforward, it requires no fine
tuning and creates its own aggregates.
๏ ROLAP tools, on the other hand, require a specific schema for
the relational database.
๏ Skilled DBAs must provide the appropriate schema (star or
snowflake schema), tune the database, and create the
appropriate summary tables.
๏ However, many ROLAP tools are metadata-driven, which
means the multidimensional view is generated and maintained
more easily.
60
OLAP Needs: MIS
Needs
61. Low Structure Maintenance
๏ The structure of a MOLAP toolโs underlying MDDB greatly
depends on each of its dimensions. When one dimension
changes, the entire MDDB must be re-structured.
๏ Multi-matrix MDDBs reduce the maintenance burden
๏ ROLAP systems do not store data in a proprietary structure.
๏ They build and maintain a constant link between the
multidimensional view and the underlying RDBMS using the
metadata.
๏ No database restructuring is required.
61
OLAP Needs: MIS
Needs
62. Low Aggregate Maintenance
๏ MOLAP tools automatically create high-level aggregates
based on your lower-level MDDB data and aggregate
definitions.
๏ When data is updated, the aggregates are automatically
updated and stored in the MDDB.
๏ With ROLAP tools, MIS staff must continually monitor the use
of summary tables to keep their cost/performance ratio in
check.
๏ DBAs inevitably use sophisticated statistics to isolate only the
most frequently accessed aggregates, and store them in
summary tables.
๏ These tables leave ROLAP administrators with a heavy
maintenance burden.
62
OLAP Needs: MIS
Needs
64. 64
ROLAP vs. MOLAP
1) Performance:
โข How fast will the system appear to the end-user?
โข MDD server vendors believe this is a key point
in their favor.
2) Data volume and scalability:
โข While MDD servers can handle up to 100GB of
storage, RDBMS servers can handle hundreds of
gigabytes and terabytes.
65. o Best of both worlds
o Storing detailed data in RDBMS
o Storing aggregated data in MDBMS
o User access via MOLAP tools
65
Hybrid OLAP - HOLAP
66. 66
HOLAP
Multi-dimensional
access
Multidimensional
Viewer
Relational
Viewer
MDBMS Server Client
Multi-dimensional
data
RDBMS Server
SQL-Read
User
data Meta data
Derived
data
SQL-Reach
Through
SQL-Read
67. IF
A. You require write access
B. Your data is under 50 GB
C. Your timetable to implement is 60-90 days
D. Lowest level already aggregated
E. Data access on aggregated level
F. Youโre developing a general-purpose application for inventory movement or assets management
THEN
Consider an MDD /MOLAP solution for your data mart
IF
A. Your data is over 100 GB
B. You have a "read-only" requirement
C. Historical data at the lowest level of granularity
D. Detailed access, long-running queries
E. Data assigned to lowest level elements
THEN
Consider an RDBMS/ROLAP solution for your data mart.
IF
A. OLAP on aggregated and detailed data
B. Different user groups
C. Ease of use and detailed data
THEN
Consider an HOLAP for your data mart
67
ROLAP, MOLAP, or
HOLAP
68. ๏ ROLAP: RDBMS -> star/snowflake schema
๏ MOLAP: MDDB -> Cube structures
๏ ROLAP or MOLAP: Data models used play major role in
performance differences
๏ MOLAP: for summarized and relatively lesser volumes of data
(100GB)
๏ ROLAP: for detailed and larger volumes of data
๏ Both storage methods have strengths and weaknesses
๏ The choice is requirement specific, though currently data
warehouses are predominantly built using RDBMSs/ROLAP.
๏ HOLAP is emerging as the OLPA server of choice
68
Conclusions