Multi-Dimensional Databases
Outline
 Multidimensional Databases
 Contrast MDD and Relational Databases
 When is MDD (In)appropriate?
 MDD Features
 Pros/Cons of MDD
MDDB: Why?
 No single "best" data structure for all applications within
an enterprise
 Organizations have abandoned the search for the holy grail
of globally accepted database
 Instead selecting the most appropriate data structure on a
case-by-case basis from a palette of standard database
structures
 Multidimensional Databases for OLAP?
 The multidimensional database has matured into the
database engine of choice for data analysis applications
 Inherent ability to integrate and analyze large volumes of
enterprise data
 Offers a good conceptual fit with the way end-users
visualize business data
 Most business people already think about their businesses
in multidimensional terms
 Managers tend to ask questions about product sales in
different markets over specific time periods
What is a Multi-Dimensional
Database?
A multidimensional database (MDDB) is a computer
software system designed to allow for the efficient and
convenient storage and retrieval of large
volumes of data that are
(1) intimately related and
(2) stored, viewed and analyzed from different
perspectives.
These perspectives are called dimensions.
A Motivating Example
An automobile manufacturer wants to increase sale volumes by
examining sales data collected throughout the organization. The
evaluation would require viewing historical sales volume figures from
multiple dimensions such as
Sales volume by model
Sales volume by color
Sales volume by dealer
Sales volume over time
Contrasting Relational and
Multi-Dimensional Models
SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL COLOR SALES VOLUME
MINI VAN BLUE 6
MINI VAN RED 5
MINI VAN WHITE 4
SPORTS COUPE BLUE 3
SPORTS COUPE RED 5
SPORTS COUPE WHITE 5
SEDAN BLUE 4
SEDAN RED 3
SEDAN WHITE 2
The Relational Structure
Note: Knowledge about schemas
COLOR
M
O
D
E
L
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
Multidimensional Structure
Measurement
Dimension
Positions
Dimension
Differences between MDDB and
Relational Databases
Normalized Relational MDDB
Data reorganized based on
query. Perspectives are placed
in the fields – tells us nothing
about the contents
Perspectives embedded directly
in the structure.
Browsing and data manipulation
are not intuitive to user
Data retrieval and manipulation
are easy
Slows down for large datasets
due to multiple JOIN operations
needed.
Fast retrieval for large datasets
due to predefined structure.
Flexible. Anything an MDDB
can do, can be done this way.
Relatively Inflexible. Changes in
perspectives necessitate
reprogramming of structure.
Contrasting Relational Model
and MDD-Example 2
SALES VOLUMES FOR ALL DEALERSHIPS
MODEL COLOR DEALERSHIP VOLUME
MINI VAN BLUE CLYDE 6
MINI VAN BLUE GLEASON 6
MINI VAN BLUE CARR 2
MINI VAN RED CLYDE 3
MINI VAN RED GLEASON 5
MINI VAN RED CARR 5
MINI VAN WHITE CLYDE 2
MINI VAN WHITE GLEASON 4
MINI VAN WHITE CARR 3
SPORTS COUPE BLUE CLYDE 2
SPORTS COUPE BLUE GLEASON 3
SPORTS COUPE BLUE CARR 2
SPORTS COUPE RED CLYDE 7
SPORTS COUPE RED GLEASON 5
SPORTS COUPE RED CARR 2
SPORTS COUPE WHITE CLYDE 4
SPORTS COUPE WHITE GLEASON 5
SPORTS COUPE WHITE CARR 1
SEDAN BLUE CLYDE 6
SEDAN BLUE GLEASON 4
SEDAN BLUE CARR 2
SEDAN RED CLYDE 1
SEDAN RED GLEASON 3
SEDAN RED CARR 4
SEDAN WHITE CLYDE 2
SEDAN WHITE GLEASON 2
SEDAN WHITE CARR 3
Mutlidimensional Representation
Sales Volumes
DEALERSHIP
Mini Van
Coupe
Sedan
Blue Red White
M
O
D
E
L
Clyde
Gleason
Carr
COLOR
Viewing Data - An Example
DEALERSHIP
Sales Volumes
M
O
D
E
L
COLOR
•Assume that each dimension has 10 positions, as shown in the cube above
•How many records would be there in a relational table?
•Implications for viewing data from an end-user standpoint?
Performance Advantages
Volume figure when car type = SEDAN, color=BLUE, &
dealer=GLEASON?
RDBMS – all 1000 records might need to be searched to
find the right record
MDB has more ‘knowledge’ about where the data lies
Maximum of 30 position searches
Average case
15 vs. 500
Total Sales across all colors and dealers when model =
SEDAN?
RDBMS – all 1000 records must be searched to get the
answer
MDB – Sum the contents of one 10x10 ‘slice’
 Data manipulation that requires a minute in RDBMS may require
only a few seconds in MDB
 MDBs are an order of magnitude faster than RDBMSs
 The performance advantages offered by multidimensional technology
facilitates the development of interactive decision support applications
like OLAP that can be impractical in a relational environment.
Real World Benefits
 Ease of data presentation and navigation
 Ease of maintenance
 Performance
Adding Dimensions- An Example
M
O
D
E
L
Mini Van
Coupe
Sedan
Blue Red White
Clyde
Gleason
Carr
COLOR
Sales Volumes
Coupe
Sedan
Blue Red White
Clyde
Gleason
Carr
COLOR
DEALERSHIP
Mini Van
Coupe
Sedan
Blue Red White
Clyde
Gleason
Carr
COLOR
JANUARY FEBRUARY MARCH
Mini Van
When is MDD (In)appropriate?
PERSONNEL
LAST NAME EMPLOYEE# EMPLOYEE AGE
SMITH 01 21
REGAN 12 19
FOX 31 63
WELD 14 31
KELLY 54 27
LINK 03 56
KRANZ 41 45
LUCUS 33 41
WEISS 23 19
First, consider situation 1
When is MDD (In)appropriate?
Now consider situation 2
SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL COLOR VOLUME
MINI VAN BLUE 6
MINI VAN RED 5
MINI VAN WHITE 4
SPORTS COUPE BLUE 3
SPORTS COUPE RED 5
SPORTS COUPE WHITE 5
SEDAN BLUE 4
SEDAN RED 3
SEDAN WHITE 2
1. Set up a MDD structure for situation 1, with LAST NAME
and Employee# as dimensions, and AGE as the measurement.
2. Set up a MDD structure for situation 2, with MODEL and
COLOR as dimensions, and SALES VOLUME as the measurement.
When is MDD (In)appropriate?
COLOR
M
O
D
E
L
Miini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
EMPLOYEE #
L
A
S
T
N
A
M
E
Kranz
Weiss
Lucas
41 3331
45
19
Employee Age
41
31
56
63
21
19
Smith
Regan
Fox
Weld
Kelly
Link
01 14 54 03 1223
27
Note the sparseness in the second MDD representation
MDD Structures for the Situations
When is MDD (In)appropriate?
 Our sales volume dataset has a great number of meaningful
interrelationships
 Interrelationships more meaningful than individual data elements
themselves.
 The greater the number of inherent interrelationships between the
elements of a dataset, the more likely it is that a study of those
interrelationships will yield business information of value to the
company.
 Highly interrelated dataset types be placed in a multidimensional
data structure for greatest ease of access and analysis
When is MDD (In)appropriate?
 No last name is matching with more than one emp # and no emp # is
matching with more than one last name.
 In contrast, there is a sales figure associated with every combination of
model and color resulting in a completed filled up 3x3 matrix.
 Performance suffers (RDB vs. MDB )
When is MDD Appropriate?
The greater the number of inherent interrelationships between the
elements of a dataset, the more likely it is that a study of those
interrelationships will yield business information of value to the
company.
Most companies have limited time and resources to devote to
analyzing data.
It therefore becomes critical that these highly interrelated dataset
types be placed in a multidimensional data structure for greatest ease
of access and analysis.
When is MDD Appropriate?
Examples of applications that are suited for multidimensional technology:
Financial Analysis and Reporting
Budgeting
Promotion Tracking
Quality Assurance and Quality Control
Product Profitability
Survey Analysis
MDD Features - Rotation
Sales Volumes
COLOR
M
O
D
E
L
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
MODEL
C
O
L
O
R
SedanCoupe
Red
White
Blue 6 3 4
5 5 3
4 5 2
( ROTATE 90
o
)
View #1 View #2
Mini Van
•Also referred to as “data slicing.”
•Each rotation yields a different slice or two dimensional table
of data – a different face of the cube.
MDD Features - Rotation
COLORCOLORMODEL
MODELDEALERSHIPDEALERSHIP
M
O
D
E
L
Mini Van
Coupe
Sedan
Blue Red White
Clyde
Gleason
Carr
COLOR
Mini Van
Blue
Red
White
Clyde
Gleason
Carr
MODEL
Mini Van
Coupe
Sedan
Blue
Red
White
Carr
C
O
L
O
R
C
O
L
O
R
DEALERSHIP
View #1 View #2 View #3
D
E
A
L
E
R
S
H
I
P
Mini Van
Coupe
Sedan
BlueRedWhite
Clyde
Gleason
Carr
Mini Van Coupe Sedan
Blue
Red
White
Clyde
Gleason
Carr Mini Van
Coupe
Sedan
Blue
Red
White
Clyde Gleason Carr
View #4 View #5 View #6
D
E
A
L
E
R
S
H
I
P
CoupeSedan
( ROTATE 90
o
) ( ROTATE 90
o
) ( ROTATE 90
o
)
COLOR MODEL
M
O
D
E
L
DEALERSHIP
( ROTATE 90
o
) ( ROTATE 90
o
)
Gleason Clyde
Sales Volumes
MDD Features - Rotation
 All the six views can be obtained by simple rotation.
 In MDBs rotations are simple as no rearrangement of data
is required.
 Rotation is also referred to as “data slicing”
MDD Features - Ranging
 How sales volume of models painted with new metallic blue
compared with the sales of normal blue color models?
 The user knows that only Sports Coupe and Mini Van models have
received the new paint treatment
 Also the user knows that only 2 dealers viz, Carr and Clyde have
unconstrained supply of these models
MDD Features - Ranging
Sales Volumes
DEALERSHIP
Mini Van
Coupe
Metal
Blue
M
O
D
E
L
Clyde
Carr
COLOR
Normal
Blue
Mini Van
Coupe
Normal
Blue
Metal
Blue
Clyde
Carr
• The end user selects the desired positions along each dimension.
• Also referred to as "data dicing."
• The data is scoped down to a subset grouping
MDD Features - Ranging
 The reduced array can now be rotated and used in computations in the
same was as the parent array
 Referred to as “Data Dicing” as data is scoped down to a subset
grouping
 Complex SQL query is required in RDB
 Performance is better in MDB as less resource consuming searches are
required
MDD Features - Roll-Ups & Drill
Downs
 Users want different views of the same data
 For eg., Sales Volume by model vs sales volume by dealership
 Many times views are similar
 Sales volume by dealership vs. volume by district
 Natural relationship between Sales Volumes at the DEALERSHIP
level and Sales Volumes at the DISTRICT level
 Sales Volumes for all the dealerships in a district sum to the Sales
Volumes for that district
MDD Features - Roll-Ups & Drill
Downs
 Multidimensional database technology is specially designed to
facilitate the handling of natural relationships
 Define two related aggregates on the same dimension
 One aggregation is dealership and the other district
 District is at a higher level of aggregation than dealership
MDD Features - Roll-Ups & Drill
Downs
Gary
Gleason Carr Levi Lucas Bolton
Midwest
St. LouisChicago
Clyde
REGION
DISTRICT
DEALERSHIP
ORGANIZATION DIMENSION
• The figure presents a definition of a hierarchy within the
organization dimension.
• Aggregations perceived as being part of the same dimension.
• Moving up and moving down levels in a hierarchy is referred to
as “roll-up” and “drill-down.”
MDD Features - Roll-Ups & Drill
Downs
ROLL UP
DRILL DOWN
Pros/Cons of MDD
 Cognitive Advantages for the User
 Ease of Data Presentation and Navigation,
Time dimension
 Performance
 Less flexible
 Requires greater initial effort

mutidimensional database

  • 1.
  • 2.
    Outline  Multidimensional Databases Contrast MDD and Relational Databases  When is MDD (In)appropriate?  MDD Features  Pros/Cons of MDD
  • 3.
    MDDB: Why?  Nosingle "best" data structure for all applications within an enterprise  Organizations have abandoned the search for the holy grail of globally accepted database  Instead selecting the most appropriate data structure on a case-by-case basis from a palette of standard database structures  Multidimensional Databases for OLAP?
  • 4.
     The multidimensionaldatabase has matured into the database engine of choice for data analysis applications  Inherent ability to integrate and analyze large volumes of enterprise data  Offers a good conceptual fit with the way end-users visualize business data  Most business people already think about their businesses in multidimensional terms  Managers tend to ask questions about product sales in different markets over specific time periods
  • 5.
    What is aMulti-Dimensional Database? A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are (1) intimately related and (2) stored, viewed and analyzed from different perspectives. These perspectives are called dimensions.
  • 6.
    A Motivating Example Anautomobile manufacturer wants to increase sale volumes by examining sales data collected throughout the organization. The evaluation would require viewing historical sales volume figures from multiple dimensions such as Sales volume by model Sales volume by color Sales volume by dealer Sales volume over time
  • 7.
    Contrasting Relational and Multi-DimensionalModels SALES VOLUMES FOR GLEASON DEALERSHIP MODEL COLOR SALES VOLUME MINI VAN BLUE 6 MINI VAN RED 5 MINI VAN WHITE 4 SPORTS COUPE BLUE 3 SPORTS COUPE RED 5 SPORTS COUPE WHITE 5 SEDAN BLUE 4 SEDAN RED 3 SEDAN WHITE 2 The Relational Structure Note: Knowledge about schemas
  • 8.
    COLOR M O D E L Mini Van Sedan Coupe Red WhiteBlue 65 4 3 5 5 4 3 2 Sales Volumes Multidimensional Structure Measurement Dimension Positions Dimension
  • 9.
    Differences between MDDBand Relational Databases Normalized Relational MDDB Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents Perspectives embedded directly in the structure. Browsing and data manipulation are not intuitive to user Data retrieval and manipulation are easy Slows down for large datasets due to multiple JOIN operations needed. Fast retrieval for large datasets due to predefined structure. Flexible. Anything an MDDB can do, can be done this way. Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure.
  • 10.
    Contrasting Relational Model andMDD-Example 2 SALES VOLUMES FOR ALL DEALERSHIPS MODEL COLOR DEALERSHIP VOLUME MINI VAN BLUE CLYDE 6 MINI VAN BLUE GLEASON 6 MINI VAN BLUE CARR 2 MINI VAN RED CLYDE 3 MINI VAN RED GLEASON 5 MINI VAN RED CARR 5 MINI VAN WHITE CLYDE 2 MINI VAN WHITE GLEASON 4 MINI VAN WHITE CARR 3 SPORTS COUPE BLUE CLYDE 2 SPORTS COUPE BLUE GLEASON 3 SPORTS COUPE BLUE CARR 2 SPORTS COUPE RED CLYDE 7 SPORTS COUPE RED GLEASON 5 SPORTS COUPE RED CARR 2 SPORTS COUPE WHITE CLYDE 4 SPORTS COUPE WHITE GLEASON 5 SPORTS COUPE WHITE CARR 1 SEDAN BLUE CLYDE 6 SEDAN BLUE GLEASON 4 SEDAN BLUE CARR 2 SEDAN RED CLYDE 1 SEDAN RED GLEASON 3 SEDAN RED CARR 4 SEDAN WHITE CLYDE 2 SEDAN WHITE GLEASON 2 SEDAN WHITE CARR 3
  • 11.
    Mutlidimensional Representation Sales Volumes DEALERSHIP MiniVan Coupe Sedan Blue Red White M O D E L Clyde Gleason Carr COLOR
  • 12.
    Viewing Data -An Example DEALERSHIP Sales Volumes M O D E L COLOR •Assume that each dimension has 10 positions, as shown in the cube above •How many records would be there in a relational table? •Implications for viewing data from an end-user standpoint?
  • 14.
    Performance Advantages Volume figurewhen car type = SEDAN, color=BLUE, & dealer=GLEASON? RDBMS – all 1000 records might need to be searched to find the right record MDB has more ‘knowledge’ about where the data lies Maximum of 30 position searches Average case 15 vs. 500
  • 15.
    Total Sales acrossall colors and dealers when model = SEDAN? RDBMS – all 1000 records must be searched to get the answer MDB – Sum the contents of one 10x10 ‘slice’
  • 16.
     Data manipulationthat requires a minute in RDBMS may require only a few seconds in MDB  MDBs are an order of magnitude faster than RDBMSs  The performance advantages offered by multidimensional technology facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment.
  • 17.
    Real World Benefits Ease of data presentation and navigation  Ease of maintenance  Performance
  • 18.
    Adding Dimensions- AnExample M O D E L Mini Van Coupe Sedan Blue Red White Clyde Gleason Carr COLOR Sales Volumes Coupe Sedan Blue Red White Clyde Gleason Carr COLOR DEALERSHIP Mini Van Coupe Sedan Blue Red White Clyde Gleason Carr COLOR JANUARY FEBRUARY MARCH Mini Van
  • 19.
    When is MDD(In)appropriate? PERSONNEL LAST NAME EMPLOYEE# EMPLOYEE AGE SMITH 01 21 REGAN 12 19 FOX 31 63 WELD 14 31 KELLY 54 27 LINK 03 56 KRANZ 41 45 LUCUS 33 41 WEISS 23 19 First, consider situation 1
  • 20.
    When is MDD(In)appropriate? Now consider situation 2 SALES VOLUMES FOR GLEASON DEALERSHIP MODEL COLOR VOLUME MINI VAN BLUE 6 MINI VAN RED 5 MINI VAN WHITE 4 SPORTS COUPE BLUE 3 SPORTS COUPE RED 5 SPORTS COUPE WHITE 5 SEDAN BLUE 4 SEDAN RED 3 SEDAN WHITE 2 1. Set up a MDD structure for situation 1, with LAST NAME and Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL and COLOR as dimensions, and SALES VOLUME as the measurement.
  • 21.
    When is MDD(In)appropriate? COLOR M O D E L Miini Van Sedan Coupe Red WhiteBlue 6 5 4 3 5 5 4 3 2 Sales Volumes EMPLOYEE # L A S T N A M E Kranz Weiss Lucas 41 3331 45 19 Employee Age 41 31 56 63 21 19 Smith Regan Fox Weld Kelly Link 01 14 54 03 1223 27 Note the sparseness in the second MDD representation MDD Structures for the Situations
  • 22.
    When is MDD(In)appropriate?  Our sales volume dataset has a great number of meaningful interrelationships  Interrelationships more meaningful than individual data elements themselves.  The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company.  Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis
  • 23.
    When is MDD(In)appropriate?  No last name is matching with more than one emp # and no emp # is matching with more than one last name.  In contrast, there is a sales figure associated with every combination of model and color resulting in a completed filled up 3x3 matrix.  Performance suffers (RDB vs. MDB )
  • 24.
    When is MDDAppropriate? The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company. Most companies have limited time and resources to devote to analyzing data. It therefore becomes critical that these highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis.
  • 25.
    When is MDDAppropriate? Examples of applications that are suited for multidimensional technology: Financial Analysis and Reporting Budgeting Promotion Tracking Quality Assurance and Quality Control Product Profitability Survey Analysis
  • 26.
    MDD Features -Rotation Sales Volumes COLOR M O D E L Mini Van Sedan Coupe Red WhiteBlue 6 5 4 3 5 5 4 3 2 MODEL C O L O R SedanCoupe Red White Blue 6 3 4 5 5 3 4 5 2 ( ROTATE 90 o ) View #1 View #2 Mini Van •Also referred to as “data slicing.” •Each rotation yields a different slice or two dimensional table of data – a different face of the cube.
  • 27.
    MDD Features -Rotation COLORCOLORMODEL MODELDEALERSHIPDEALERSHIP M O D E L Mini Van Coupe Sedan Blue Red White Clyde Gleason Carr COLOR Mini Van Blue Red White Clyde Gleason Carr MODEL Mini Van Coupe Sedan Blue Red White Carr C O L O R C O L O R DEALERSHIP View #1 View #2 View #3 D E A L E R S H I P Mini Van Coupe Sedan BlueRedWhite Clyde Gleason Carr Mini Van Coupe Sedan Blue Red White Clyde Gleason Carr Mini Van Coupe Sedan Blue Red White Clyde Gleason Carr View #4 View #5 View #6 D E A L E R S H I P CoupeSedan ( ROTATE 90 o ) ( ROTATE 90 o ) ( ROTATE 90 o ) COLOR MODEL M O D E L DEALERSHIP ( ROTATE 90 o ) ( ROTATE 90 o ) Gleason Clyde Sales Volumes
  • 28.
    MDD Features -Rotation  All the six views can be obtained by simple rotation.  In MDBs rotations are simple as no rearrangement of data is required.  Rotation is also referred to as “data slicing”
  • 29.
    MDD Features -Ranging  How sales volume of models painted with new metallic blue compared with the sales of normal blue color models?  The user knows that only Sports Coupe and Mini Van models have received the new paint treatment  Also the user knows that only 2 dealers viz, Carr and Clyde have unconstrained supply of these models
  • 30.
    MDD Features -Ranging Sales Volumes DEALERSHIP Mini Van Coupe Metal Blue M O D E L Clyde Carr COLOR Normal Blue Mini Van Coupe Normal Blue Metal Blue Clyde Carr • The end user selects the desired positions along each dimension. • Also referred to as "data dicing." • The data is scoped down to a subset grouping
  • 31.
    MDD Features -Ranging  The reduced array can now be rotated and used in computations in the same was as the parent array  Referred to as “Data Dicing” as data is scoped down to a subset grouping  Complex SQL query is required in RDB  Performance is better in MDB as less resource consuming searches are required
  • 32.
    MDD Features -Roll-Ups & Drill Downs  Users want different views of the same data  For eg., Sales Volume by model vs sales volume by dealership  Many times views are similar  Sales volume by dealership vs. volume by district  Natural relationship between Sales Volumes at the DEALERSHIP level and Sales Volumes at the DISTRICT level  Sales Volumes for all the dealerships in a district sum to the Sales Volumes for that district
  • 33.
    MDD Features -Roll-Ups & Drill Downs  Multidimensional database technology is specially designed to facilitate the handling of natural relationships  Define two related aggregates on the same dimension  One aggregation is dealership and the other district  District is at a higher level of aggregation than dealership
  • 34.
    MDD Features -Roll-Ups & Drill Downs Gary Gleason Carr Levi Lucas Bolton Midwest St. LouisChicago Clyde REGION DISTRICT DEALERSHIP ORGANIZATION DIMENSION • The figure presents a definition of a hierarchy within the organization dimension. • Aggregations perceived as being part of the same dimension. • Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.”
  • 35.
    MDD Features -Roll-Ups & Drill Downs
  • 36.
  • 37.
  • 39.
    Pros/Cons of MDD Cognitive Advantages for the User  Ease of Data Presentation and Navigation, Time dimension  Performance  Less flexible  Requires greater initial effort