Quasi Real-Time OLAP Cubes Using Partitioning

Case study:
Quasi real-time OLAP cubes
by Ziemowit Jankowski
Database Architect

OLAP Cubes - what is it?
●Used to quickly analyze and retrieve data from different
perspectives
●Numeric data
●Structured data:
ocan be represented as numeric values (or sets
thereof) accessed by a composite key
oeach of the parts of the composite key belongs to a
well-defined set of values
●Facts = numeric values
●Dimensions = parts of the composite key
●Source = usually a start or snowflake schema in a
relational DB (other sources possible)

OLAP Cubes - data sources
Star schema Snowflake schema
Production
outcome
Product
Date
District
Sub-
district
Production
outcome
Product
Date
District
Year
Month
Day of week

OLAP Facts and dimensions
●Every "cell" in an OLAP cube contains
numeric data a.k.a "measures".
●Every "cell" may contain more than
one measure, e.g. forecast and
outcome.
●Every "cell" has a unique
combination of dimension values.
District
Product

OLAP Cubes - operations
●Slice = choose values corresponding to
ONE value on one or more dimensions
●Dice = choose values corresponding to
one slice or a number of consecutive
slices on more than 2 dimensions of
the cube

OLAP Cubes - operations (cont'd)
●Drill down/up = choose lower/higher
level details. Used in context of
hierarchical dimensions.
●Pivot = rotate the orientation of the
data for reporting purposes
●Roll-up

OLAP Cubes - refresh methods
●Incremental:
opossible when cubes grow
"outwards", i.e. no "scattered"
changes in data
oonly delta data need to be read
orefresh may be fast if delta is small
●Full:
opossible for all cubes, even when
changes are "scattered" all over
thedata
oall data need to be re-read with
every
orefresh may take long time (hours)
Time
Cube data
Updates
Time
Cube data
New data

The situation on hand
●Business operating on 24*6 basis (Sun-Fri)
●Events from production systems are aggregated into flows
and production units
●Production figures may be adjusted manually long after
production date
●Daily production figures are basis for daily forecasts with
the simplified formula:
forecast(yearX) = production(yearX-1) * trend(yearX) + manualFcastAdjustm
●Adjustments in production figures will alter forecast figures
●Outcome and forecast should be stored in MS OLAP cubes as
per software architecture demands
●The system should simplify comparisons between forecast
and outcome figures

Software
●Source of data:
oRelational database
oOracle 10g database
oextensive use of PL/SQL in database
●Destination of data:
oOLAP cubes - MS SQL Server Analysis Services (version
2005 and 2008)
●Other software:
oMS SQL Server database

QUESTION
Can we get almost real-time reports from MS OLAP cubes?
ANSWER
YES! The answer lies in "cube partitioning".

Cube partitioning - the basics
●Cube partitions may be updated independently
●Cube partitions may not overlap (duplicate values may
occur)
●Time is a good dimension to partition on
Time

MS OLAP cube partitioning - details
●Every cube partition has its own query to define the data
set fetched from the data source
●The SQL statements define the non-overlapping data sets
Relational DB
Partitioned cube
SQL query a SQL query b SQL query c SQL query d
Data source

MS OLAP cube partitioning - details
Relational DB
Partitioned cube
SQL query a SQL query b SQL query c SQL query d
Data source
Dim Dim Dim
Facts
Small amount of
data
Large amount of
data

How to partition? - theory
●Partitions with different lengths and different update
frequencies:
ocurrent data = very small partition, very short update
times, updated often
o"not very current" data = a bit larger partition, longer
update times, updated less often
ohistorical data = large partition, long update times,
updated seldom
●Operation 24x6 delivers the "seldom" window

How to partition? - theory cont'd
●One cube for both forecast and outcome
Forecast measure
Outcome measure
One year
into the future
Now
Last
month
Last yearHistory

Solution - approach one
Decisions:
●Cubes partitioned on date boundaries
●MOLAP cubes (for better queryperformance)
●Use SSIS to populate cubes
odimensions populated by incremental processing
ofacts populated by full processing
ojobs for historical data must be run after midnight to
compensate for date change
Actions:
●Cubes built
●SSIS deployed inside SQL Server (and not filesystem)
●SSIS set up as scheduled database jobs

Did it work?
No!
Malfunctions:
●Simultaneous updates of cube partitions could lead to
deadlocks
●Deadlocks left cube partitions in unprocessed state
Amendment:
●Cube partitions must not be updated simultaneously

Solution - approach two
Decisions:
●Cube processing must be ONE partition at a time
●Scheduling done by SSIS "super package":
oSQL Server table contains approx. frequency and package
names
o"super package" executes SSIS packages as indicated by
the table
Actions:
●Scheduling table created
●"Super package" created to be self-modifying

Did it work?
Not really!
Malfunctions:
●Historical data had to be updated after midnight and real-
time updates for "Now" partition were postponed. This was
done to avoid "gaps" in outcome data and "overlappings" in
forecast data.
●Real-time updates ended soon after midnight and were
resumed a few hours later. (That was NOT acceptable.)
Amendment:
●Re-think!

Solution - approach three
Decisions:
●Take advantage of 6*24 cycle (as opposed to 7*24)
●Switch dates on Saturdays only
othe "Now" partition had to stretch from Saturday to
Saturday
oall other partitions had to stretch from a Saturday to
another Saturday
●Re-process all time-consuming partitions on Saturday after
switch of date

Solution - approach three cont'd
Actions:
●Create logic in Oracle database to do date calculations
"modulo week", i.e. based on Saturday. Logic implemented
as function.
●Rewrite SQL statements for cube partitions so that they
employ the Oracle function (as above) instead of current
date +/- given number of days.
●Reschedule the time consuming updates so they run every
7th day.

Did it work?
Yes!
Malfunctions:
●None, really.

Lessons learned
●It is possible to build real-time OLAP cubes in MS
technology
●It is possible to make the partitions self-maintaining in
terms of partition boundaries
●The concept need careful engineering as there are pits in
the way.

Omitted details
Some details have been omitted:
●the quasi real-time updates are scheduled to occur every
2nd or 3rd minute
●scheduling is not exact, as the Super-job keeps track of
what is to be run and when and executes SSIS packages
based on "scheduled-to-run" state, their priority and a few
other criteria
●the source of data is not a proper star schema, it is rather
an emulation of facts and dimensions by means of data
tables and views in Oracle.

Quasi Real-Time OLAP Cubes Using Partitioning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Quasi Real-Time OLAP Cubes Using Partitioning

Similar to Quasi Real-Time OLAP Cubes Using Partitioning (20)

Quasi Real-Time OLAP Cubes Using Partitioning