Data Warehousing
Lecture-8
Online Analytical Processing (OLAP)
1
Mamuna Fatima
DWH & OLAP
•Relationship between DWH &
OLAP
•Data Warehouse & OLAP go
together.
•Analysis supported by OLAP
2
Supporting the human thought process
How many such query sequences can be programmed in advance? 3
THOUGHT PROCESSTHOUGHT PROCESS QUERY SEQUENCEQUERY SEQUENCE
An enterprise wide fall in profit
Profit down by a large
percentage consistently during
last quarter only. Rest is OK
What is special about last
quarter ?
Products alone doing OK, but
North region is most
problematic.
What was the quarterly sales
during last year ??
What was the quarterly sales at
regional level during last year ??
What was the monthly sale for
last quarter group by products
What was the monthly sale of
products in north at store level
group by products purchased
OK. So the problem is the high
cost of products purchased
in north.
What was the quarterly sales at
product level during last year?
What was the monthly sale for
last quarter group by region
Analysis of last example
• Analysis is Ad-hoc
• Analysis is interactive (user driven)
• Analysis is iterative
– Answer to one question leads to a dozen more
• Analysis is directional
– Drill Down
– Roll Up
– Pivot
4
More in
subsequent
slides
Challenges…
• Not feasible to write predefined queries.
– Fails to remain user_driven (becomes programmer
driven).
– Fails to remain ad_hoc and hence is not interactive.
• Enable ad-hoc query support
– Business user can not build his/her own queries
(does not know SQL, should not know it).
– On_the_go SQL generation and execution too slow.
5
Challenges
• Contradiction
– Want to compute answers in advance, but don't
know the questions
• Solution
– Compute answers to “all” possible “queries”. But
how?
– NOTE: Queries are multidimensional aggregates at
some level
6
“All” possible queries (level aggregates)
7
Province Frontier Punjab...
Division MultanLahorePeshawarMardan ......
Lahore ... Gugranwala
City
Zone GulbergDefense ...
District LahorePeshawar
ALL ALLALL ALL
Where Does OLAP Fit In?
• It is a classification of applications, NOT a
database design technique.
• Analytical processing uses multi-level
aggregates, instead of record level access.
• Objective is to support very
I. fast
II. iterative and
III. ad-hoc decision-making.
8
Where does OLAP fit in?
9
Transaction
Data

Presentation
Tools
Reports
OLAP
Data Cube
(MOLAP)
Data
Loading

?
Decision
Maker
OLTP vs. OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data per
transaction
Small Large
Views Pre-defined User-defined
Typical write
operation
Update, insert, delete Bulk insert
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109
B – 1012
B) High (1012
B – 1015
B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med
10
OLAP FASMI Test
Fast: Delivers information to the user at a fairly
constant rate. Most queries answered in under five
seconds.
Analysis: Performs basic numerical and statistical
analysis of the data, pre-defined by an application
developer or defined ad-hocly by the user.
Shared: Implements the security requirements
necessary for sharing potentially confidential data
across a large user population.
Multi-dimensional: The essential characteristic of
OLAP.
Information: Accesses all the data and information
necessary and relevant for the application, wherever
it may reside and not limited by volume.
...from the OLAP Report by Pendse and Creeth.
11

Dwh lecture 10-olap

  • 1.
    Data Warehousing Lecture-8 Online AnalyticalProcessing (OLAP) 1 Mamuna Fatima
  • 2.
    DWH & OLAP •Relationshipbetween DWH & OLAP •Data Warehouse & OLAP go together. •Analysis supported by OLAP 2
  • 3.
    Supporting the humanthought process How many such query sequences can be programmed in advance? 3 THOUGHT PROCESSTHOUGHT PROCESS QUERY SEQUENCEQUERY SEQUENCE An enterprise wide fall in profit Profit down by a large percentage consistently during last quarter only. Rest is OK What is special about last quarter ? Products alone doing OK, but North region is most problematic. What was the quarterly sales during last year ?? What was the quarterly sales at regional level during last year ?? What was the monthly sale for last quarter group by products What was the monthly sale of products in north at store level group by products purchased OK. So the problem is the high cost of products purchased in north. What was the quarterly sales at product level during last year? What was the monthly sale for last quarter group by region
  • 4.
    Analysis of lastexample • Analysis is Ad-hoc • Analysis is interactive (user driven) • Analysis is iterative – Answer to one question leads to a dozen more • Analysis is directional – Drill Down – Roll Up – Pivot 4 More in subsequent slides
  • 5.
    Challenges… • Not feasibleto write predefined queries. – Fails to remain user_driven (becomes programmer driven). – Fails to remain ad_hoc and hence is not interactive. • Enable ad-hoc query support – Business user can not build his/her own queries (does not know SQL, should not know it). – On_the_go SQL generation and execution too slow. 5
  • 6.
    Challenges • Contradiction – Wantto compute answers in advance, but don't know the questions • Solution – Compute answers to “all” possible “queries”. But how? – NOTE: Queries are multidimensional aggregates at some level 6
  • 7.
    “All” possible queries(level aggregates) 7 Province Frontier Punjab... Division MultanLahorePeshawarMardan ...... Lahore ... Gugranwala City Zone GulbergDefense ... District LahorePeshawar ALL ALLALL ALL
  • 8.
    Where Does OLAPFit In? • It is a classification of applications, NOT a database design technique. • Analytical processing uses multi-level aggregates, instead of record level access. • Objective is to support very I. fast II. iterative and III. ad-hoc decision-making. 8
  • 9.
    Where does OLAPfit in? 9 Transaction Data  Presentation Tools Reports OLAP Data Cube (MOLAP) Data Loading  ? Decision Maker
  • 10.
    OLTP vs. OLAP FeatureOLTP OLAP Level of data Detailed Aggregated Amount of data per transaction Small Large Views Pre-defined User-defined Typical write operation Update, insert, delete Bulk insert “age” of data Current (60-90 days) Historical 5-10 years and also current Number of users High Low-Med Tables Flat tables Multi-Dimensional tables Database size Med (109 B – 1012 B) High (1012 B – 1015 B) Query Optimizing Requires experience Already “optimized” Data availability High Low-Med 10
  • 11.
    OLAP FASMI Test Fast:Delivers information to the user at a fairly constant rate. Most queries answered in under five seconds. Analysis: Performs basic numerical and statistical analysis of the data, pre-defined by an application developer or defined ad-hocly by the user. Shared: Implements the security requirements necessary for sharing potentially confidential data across a large user population. Multi-dimensional: The essential characteristic of OLAP. Information: Accesses all the data and information necessary and relevant for the application, wherever it may reside and not limited by volume. ...from the OLAP Report by Pendse and Creeth. 11

Editor's Notes

  • #3 In olap nothing is online, it is online that it has requirement that ans should come under 5 sec, so an impression of online goes to user. Oltp sys are fully normalized and olap are highly denormalized. We will discuss the ways to make the olap. The concept of dwh without olap is impossible. When a dwh is developed, olap is used for analysis. Olap is an enviornment or tool which must need a dwh. Olap is not a physical implementation it is a framework, will discuss in detail.
  • #4 First see how the decision makers think and how could we fill it with the help of queries. The decision maker need the ans of questions so that he could take a decision and we have to mimic it with sql queries. A profit fall or drop occurs and decision maker wants to know why. So the corresponding query will be “what was the quarterly sales during last year” means he wants to know what was the sale in 4 quarters of last year. Then he wants to anlyse w.r.t region and further he see it from another angle that What was the quarterly sales at product level during last year? Now he see the data from a different angles, because it shouldn’t be the case that some informative area of data should not remain untouch. The ans of these queries will be displayed before him in the form of charts. So when he saw chart he found that rest is ok but the fall of profit occurred only in the last quarter. But then he wants to explore the data of last quarter and wants to explore it, so he say what is special about last quarter. So he say that what was the quarterly sales at product level during last year? And then he saw monthly sales w.r.t products. And then he saw products w.r.t region. So he identified the region which is problematic. Then he saw w.r.t another angle he say What was the monthly sale of products in north at store level group by products purchased So it revealed that the products purchased at north region was high but to meet the challenge of compititors it may not be feasible to sale the products on those prices, so the fall of profit occurs. Because when you take less profit automatically your gross profit reduces. You will remain in profit but its percentage will fall as compare to the previous years profit. So this is the thought process having support from the anallysis tool known as the olap. But here a question come that How many such query sequences can be programmed in advance? So can you write the sequence of queries, ans is no. because the questions coming in the mind of decision makers cant be told in advance, so he reveals by exploring from different angles. So don’t you need the speed, and also if you are a decision maker and wants to know the ans from IT people, and they say we will answer them after 2 months. So you will be frustrated. So now we have to design the syst by how your business runs.
  • #5 So last slide reveals. That query seq is adhoc, no one knows in adv which and how many queries will come first. So a major diff b/w olap and oltp sys. The tool is used by decision maker interactively i.e using mouse on screen. So an interactive process. It should be interactive otherwise thought process breaks. Another thing is the process is iterative, so you know the ans of one question then other then next. It has direction i.e. you aggregate or see the elements i.e. from week to month, or months to days. Piovting is the way to change the things in which respect you were seeing the data.
  • #6 You cannot program the queries in adv because you don’t know which queries will come, and you also don’t know how many queries are there. So these are challenges, so we can say, when a decision maker moves the mouse we track the mouse and generate the sql on the go. But suppose it is true, still you have to address the vldbs with billions of rows so how can you bring the answer within 5 secs.
  • #7 When we discuss analysis, we do not retrieve data at record level, decision maker wants to see the aggregates, then he needs to explore. So the all possible questions are actually all possible aggregates, and you give an enviorment to deciosn maker where he can explore them.
  • #8 Aggregates have hierarchy which could be drilled down.
  • #9 Olap is a framework, having different flavours. It has a few chractristics i.e the support of adhoc queires, which should be fast, and iterative, and a point missing that is interactive. So you have to see these chractristics in an application to consider it as the olap.
  • #10 A diagram from past. Here data from oltp systems loads in dwh and data is loaded in an implementation of olap here the implementation is multidimensional olap, it is a multidimensional data structure and we displayed 3 dimesions because cannot display four dimensions in an instant. Then generate graphical reports. The data cube contains the information based on the transaction data.
  • #12 Which tool or sys could considered as olap. They should have 5 chractristics. Shared: user must be able to see their specific data only. Multidimensional: databases or data structures should be multidimensional Info: it should not say I can show this info but not this. It should be able to show all info.