On-Line Analytical Processing (OLAP)
OLAP is a query based methodology that supports data analysis in a multidimensional
environment. OLAP is a valuable tool for verifying or refuting human generated
hypotheses and for performing manual data mining.
An OLAP engine logically structures multidimensional data in the form of cube like the
one shown in Figure 1. The cube displays three dimensions - purchase category, time in
month and region.
As an OLAP cube is designed for a specific purpose, it is not unusual to have several
cubes structures from the data in a single data warehouse. The design of a data cube
includes decisions about which attributes to include in the cube as well as the granularity
of each attribute. A well designed cube is configured so as to avoid situations where data
cells do not contains useful information. For example, a cube with two time dimensions,
one for month and second for fiscal quarter(Q1,Q2,Q3,Q4) is a poor choice because cell
combinations such as (January,Q4) or (December, Q1) will always be empty.
A useful OLAP systems interface allows the user to display the data from different
perspective, perform statistical computations and tests, query the data into successively
higher and /or lower levels details, cross-tabulate the data, and view the data with graphs
Figure 1. A multidimensional cube for credit card purchases.
Category = Retails
Month = January
Region = Four
Amount = 67090
Count = 120
In Figure 1, the cube contains 12 X 4 X 4 = 192 cells. Stored within each cell is the total
amount spend within a given category by all credit card customers for a specific month
and region. If an average purchase amount is to be computed, the cube will also contain a
count representing the total number of purchases for each month, category and region.
The arrow in Figure 1 points to a cube holding the total amount and the total number of
retails purchase in region four for the month of January.
Each attribute of an OLAP cube may have one or more associated concept hierarchy. A
concept hierarchy defines a mapping that allows the attributes to be viewed from varying
levels of detail. Figure 1.2 displays a concept hierarchy for the attribute location. As you
can see, region holds the highest level of generality within the hierarchy. The second
level of hierarchy tells us that one or more states make up a region. The third and fourth
levels show us that one or more cities are contained in a state and one or more addresses
are found within a city. Let’s create a scenario where our OLAP cube together with the
concept hierarchy of Figure 1.2, will be assistance in a decision making process.
Figure 1.2 A concept hierarchy for location
Suppose we wish to determine a best situation for offering a luggage and a hand bag
promotion for travel. Our goal is to determine when and where the promotional offering
will have its greatest impact on customer response. We can do this by finding those times
and locations where relatively large amounts have been previously spent on travel. Once
determined, we then designate the best regions and times for the promotional offering so
as to take advantage of the likelihood of ensuing travel purchases.
Here is the list of common OLAP operations together with a few examples for our travel
1. The SLICE operation select data on a single dimension of OLAP cube. For the cube
in Figure 1, a slice operation leaves two of the three dimensions intact, while a
selection on the remaining dimension creates a sub cube from the original cube. The
two queries for the slice operator are:
a. Provide a spreadsheet of month and region information for cells pertaining to
b. Select all cells where purchase category = retails or supermarket
2. The DICE operation extracts a sub cube from the original cube by performing a
select operation on two or more dimensions. Here are three queries requiring one or
more dice operations:
a. Identify the month of peak travel expenditure for each region.
b. Is there a significant variation in total dollars spent for travel and entertainment
by customers in each region?
c. Which month shows the greatest amount of total dollars spent on travel and
entertainment for all regions?
3. ROLL-UP or aggregation, is combining of cells for one of the dimensions defined
within a cube. One form of roll-up uses the concept hierarchy associated with a
dimension to achieve a higher level of generalization. For this example, this is
illustrated in Figure 1.3 where the roll-up is on the time dimension. The cell pointed
to in the figure contains region one supermarket data for the month October,
November and December. A second type of roll-up operator actually eliminates an
entire dimension. For our example, suppose we choose to eliminate the location
dimension. The end result is a spreadsheet of total purchases delineated by month and
Category = Supermarket
Month = Jan,Feb,March
Region = One
4. DRILL-DOWN is the reverse of a roll-up and involves examining data at some level
of greater details. Drilling down on region in Figure 1, results in a new cube where
each cell highlights a specific category, month and state.
5. ROTATION or pivoting, allows us to view the data from a new perspective. For our
example, we may find it easier to view the OLAP cube in Figure 1 by having months
displayed on the horizontal axis and purchase category on the vertical axis.
a. Most useful strategies for analyzing a cube require a sequence of two or more
b. MS Excel provides an interface that allows us to view OLAP cubes created from data
stored in a relational database
c. The information contained in the cube can be displayed and manipulated in Excel as a
Excel Pivot Table For Data Analysis.
Creating a Simple Pivot Table
We start with a simple example using credit card promotion database to show how pivot
tables summarize data for the attribute income range.
1. To begin, load the CreditCardPromotion.xls file into an Excel spreadsheet.
2. Delete the second and first rows of the spreadsheet data as they are not relevant to
3. Make sure the cursor is positioned in one of the cell containing data. Proceed to
the Data dropdown menu and select PivotTable and PivotChart Report.
4. Select the Microsoft Excel list or database radio button. This indicates that the
data to be analyzed is housed within an Excel spreadsheet. Select the PivotTable
option and click next to continue.
5. In step 2 we are asked for the data range parameters to be used for creating the
pivot table. As we initially placed the cursor in a cell containing the data, the data
range should be correct. Click next to continue step 3.
6. In step 3 we specify the location of the pivot table. Select New worksheet radio
button and click finished to continue.
Figure 2 : A Pivot Table Template
Let us use the toolbars together with the data drop area to generate a summary report for
attribute income range.
1. Use your mouse to drag income range from the toolbar into the area specified by
Drop Field Here. Next, return to toolbar and drag income range into the area
specified by Drop Data Item here.
Figure 3: A summary report for income range
The report tells us, among others thing, that the majority of credit card customers
have an income ranging between $30000 and $40000 dollars.
Now let’s change the output format for the total column (currently a count) to a
1. Single click on count of income range
2. Single click on the field settings square located in the top right portion of the
pivot table toolbar. A Pivot Table Field box will appear.
3. Single click on option >> and examine the options in the Show data as: dropdown
4. Select % of column and click Ok.
The data in the total column will now appear as a percent.
Finally let’s make a pie chart to complement the table output:
1. Begin by highlighting the percentage score for the four income range values.
2. Next, single click on the ChartWizard located in the top left portion of the pivot
table toolbar. A bar chart representing the four income range values will appear.
However we wish to have a pie chart showing the value. To accomplish this,
single click on the Chart Wizard a second time.
3. Choose one of the pie chart types and click on Finish
Figure 4: A pie chart for income range
Next we use the pivot table drill down feature to display the records of those individuals
in a particular salary range:
1. Click on Sheet4 in the bottom tray to display the pivot table.
2. Double click on the cell containing the percent for the desired salary range
(20-30K). All instances within the chosen salary range will appear in a new
3. To return to the pivot table, click on sheet4.
This completes our first example.
Pivot Table for Hypotheses Testing
The ACME Credit Card Company has decided to solicit by telephone select cardholders
who received their credit cards within last year and who did not purchase credit cards
insurance with their initial mail-in application. Their data analyst believes that there is a
relationship between a cardholder’s age and whether the cardholder has credit card
insurance. Specifically, the analyst wishes to test the hypotheses that younger cardholders
purchase credit card insurance whereas more senior cardholders do not. If the hypothesis
is true, only those cardholders under certain age will be selected for the telephone
To test the hypothesis we will use a pivot table and our imagination and assume that the
credit card promotion database contains a much larger sampling of cardholders. The
following steps test the hypothesis claiming a relationship between age and credit card
1. To begin, make sure the cursor is positioned in one of the cells of sheet1 that
contains data. Proceed to the Data Dropdown Menu and select Pivot Table and
Pivot Chart report and select finish.
2. Move age to the area labeled Drop Row Fields Here. Move credit card insurance
to the area labeled Drop Column Fields Here.
3. Move credit card insurance to the area labeled Drop Data Items Here. The
resultant pivot table is given in Figure 5
Figure 5: A pivot showing age and credit card insurance choice
The pivot table is informative in that it tells us that very few individuals currently have
credit card insurance. However the distribution of ages is such that it is difficult to make
any conclusions about a relationship between age and credit card insurance. We can use
the group function to develop a clearer picture about any possible relationship between
the two attributes. The method is as follows:
1. Single click on the age attribute within the pivot table
2. Single click on the Data dropdown menu.
3. Mouse to Group and Outline and then to Group. Single click on Group. A
grouping box that allows you to select a Starting at, Ending at, and By value will
4. Click OK to select the default values.
Figure 6: Grouping the credit card promotion data by age
The new pivot table is displayed in Figure 6. Although our data set is too small to draw
valid conclusions, grouping the data by age allows us to obtain a clearer picture of the
relationship between the two attributes.
A second method for determining if a relationship between age and credit card insurance
exists. This method computes the average ages for those individuals with and without
credit card insurance. Instead of starting with the original credit card promotion
database, we’ll modify the current pivot table by invoking the Pivot Table Wizards from
the toolbar as follows.
1. Locate the Pivot Table Wizards in the top row of the toolbar.
2. Single click on the wizard. This action invokes the step of 3 display of the Pivot
3. Locate and left click on the layout. The current pivot table layout is displayed
within the Pivot Table Wizard. Figure 7, shows the current layout.
4. Use your mouse to remove attribute age from the Row area and drag it to the age
button located on the far right of the layout display window. Next, drag credit
card insurance from the Column area to the Row area.
5. Remove Count of Credit Card Insurance from the data area and place age in the
6. Double click on Sum of Age within the data area. A PivotTable Field box will
7. Single click on Average within the Summarize by: box. Click on OK. This returns
you to the PivotTable Layout Wizards
8. Click on OK from within the PivotTable Layout Wizard. Finally, click on Finish
within the step 3 display of the PivotTable Wizard.
Figure7: PivotTable Layout Wizard
The resultant pivot table shows the average age for credit card insurance = no is
approximately 41.42, whereas the average age for credit card insurance = yes is
Figure 8: Age Summary
Creating a Multidimensional Pivot Table
For this example, we will use a pivot table to investigate relationships between the
magazine, watch and life insurance promotions relative to customer gender and income
range. We will do this by creating a three-dimensional cube like the one shown in Figure
9. Each cell of the cube contains a count of the number of customers who either did or
did not take part in the promotional offerings.
Figure 9: A credit card promotion cube
Watch Promo = No
Life Insurance Promo = Yes
Magazine Promo = Yes
The arrow in Figure 9 points to the cell holding the total number of customers who took
advantage of life insurance promotion and the magazine promotion, but who did not take
advantage of the watch promotion. We include sex and income range in our analysis by
designating these attributes as page variables. Here’s the procedure.
1. To begin, make sure the cursor is positioned in one of the cells of sheet1 that
contains data. Proceed to the Data dropdown menu and select PivotTable and
PivotChart Report and then Finish.
2. Use the mouse to drag watch promotion and life insurance promotion to the area
labeled Drop Row Field Here. Drag magazine promotion to the area labeled Drop
Column Fields Here.
3. Drag life insurance promotion, watch promotion and magazine promotion to the
area labeled Drop Data Items Here
4. Finally. Drag sex and income range to the area labeled Drop Page Fields Here.
The resultant pivot table appears in Figure 10. The 24 highlighted cells correspond to the
cells of the cube shown in Figure 9. In addition to the 24 cells representing the cube, the
pivot table also shows total yes and no counts for each promotion together with summary
total. Let’s use the pivot table to help us determine relationships among the three
Figure 10: A pivot table with page variables for credit card promotions
First we’ll use the table to find the customer count for the cell designated in Figure 10:
1. Find the area to the far left within the pivot table that shows life insurance
promotion = yes. This is given in Figure 10 by rows 15 through 20.
2. Within this same area, locate the sub region that has watch promotion=no
3. Finally. Follow this sub region to the right until you reach the column for
magazine promotion = yes
The contents of the cell show a 2 for all three promotions. This tells us that a total of two
customers took advantage of the life insurance and magazine promotions but did not
purchase the watch promotion.
We can drill down to examine the individual records represented by the cell. Simply
double click on any of the cells containing the value 2. By default, the records will be
displayed in sheet5.
Next let’s look at the paging feature. In the upper left corner of the pivot table, you will
see the paging variables sex and income range specified with the table definition. We can
use the page feature to answer questions about the relationship between the attributes
given as page variables and the promotional offerings. For example, let’s say we wish to
examine the relationship between income range and promotional offerings for female
customers. The procedure as follows:
1. Single-click on the dropdown menu for sex, highlight female, and click OK
2. Single-click on the drop menu for income range, highlight $20-$30000 and click
The pivot table displays the promotional summary data for females making between
$20000-$30000 dollars. The table shows two female customers within the specified
income range. Neither female took advantage of the watch or magazine promotions, but
one female did purchase the life insurance promotional offering. By examining the
remaining income range data, you will see that females with annual salary between
$30000 and $40000 dollars have traditionally been the best candidates for promotional
offerings. It is obvious that the paging feature adds more dimension to the analysis
capabilities of Excel pivot table.