ON MULTI-DIMENSIONAL CUBES OF CENSUS
DATA: DESIGNING AND QUERYING
Jaspreet Singh1
, Dr. Neeraj Sharma2
, Dr. Kawaljeet Singh3
Research Scholar,Department of Computer Science, Punjabi University, Patiala,Punjab, INDIA1
Professor,Department of Computer Science, Punjabi University,Patiala,Punjab, INDIA2
Director, University Computer Centre Punjabi University, Patiala, Punjab, INDIA3
jaspreetissaj@gmail.com1
, sharma_neeraj@hotmail.com2
, singhkawaljeet@rediffmail.com3
Abstract
The relational database model is perhaps some of the most commonly used model in database
technology. By retaining all of its strengths, it doesn’t execute well with complex queries and
analysis of summarized data. The concept of Online Analytical Processing (OLAP) has been
developed to meet this challenge. Data cube is the main component of OLAP, which is a
multidimensional database model which has developed for incredible speed-up of analyzing and
processing large data sets with the usage of various techniques. The most flourishing concept in
the computer industry as Business Intelligence (BI), is also dependent upon OLAP cubes to a
major extent. The primary focus of this paper is to design a data warehouse that specifically
targets OLAP storage, analyzing and querying requirements to the multidimensional cubes of
census data within efficient timely manner.
Keywords: - Business Intelligence, OLAP, Data Warehouse, Multidimensional Cubes, Census.
--------------------------------------------------------***--------------------------------------------------------
1. INTRODUCTION
During the last few decades, advancements
and mellows in the database technology has
developed a term Business Intelligence (BI),
with the aim of describing a set of
technologies which mainly caters to the two
main objectives i.e. (i) To provide ability to
end user to retrieve valuable information
from data collected by business organization
from their routine operation and (ii) To
support planned decision-making.
The advent of business intelligence be a
change in the expectations of the end user
from a database management system.
Business Intelligence means different vision
to different eyes. For some people, it is only
the data warehouse whereas for others as
dashboards for instant information for their
business strategies on their desktops screen
[1].
Business Intelligence includes the following:
 Data warehousing
 OLAP - Online Analytical
Processing
 Predictive analytics and data mining
 Business reporting, including
dashboards
All of the above technologies help an
organization’s ability to create, analyze, and
report precise information about the
business which required for forward-facing
activities.
Data warehousing and Online Analytical
Processing (OLAP) both of them are the
most important components of
contemporary Decision Support Systems
(DSS). Mutually, they allow organizations
to make effective decisions with the intense
of both of their current and future state.
Because information is vital resource for any
planned operation. All of the organizations
are constantly mounting and storing huge
amounts of data related to their day to day
operations. As today's markets are much
more competitive than those in the past, this
huge amount of organizational data should
be processed and made available when
required with more accuracy and with fastest
available speed. The success or failure of the
organizations totally dependent to their
ability to analyze and synthesize data. So it
becomes more important to process and
manage this data. Moreover, these days
decision making is dependent on availability
of high- quality, integrated information
within time variant. Therefore, an
organization should be provided with this
kind of information.
2. OBJECTIVES OF THIS WORK
 Identify relevant data and present
data as facts (Key Performance
Indicators).
 Design and implement a data
warehouse and a data cube for
the Census.
 Present the data to the end user in
form of Multidimensional model.
 To find out each and every detail,
from any dimension or by
integration of one or more
dimension.
 Analysis and evaluation of
results and query performance.
3. LITERATURE REVIEW
In data mining, the literature for
multidimensional cube of census data
alongside their correspondent star and
snowflake schema are used to design
business oriented databases. Dimensional
modeling is the most appropriate approach
to design a data warehouse for the purpose
of predictive data mining[2]. According to
Kortnik the major objectives of dimensional
modeling are: (i) to design database
structures that are easy for end users to
understand and write queries against, and
(ii) to maximize the efficiency of queries
[3]. One way to look at the multidimensional
data model is to view as a cube, the
limitation here is that, as the number of
dimensions increased, this lead to the
number of cube’s cells increases
exponentially [4]. The majority of
multidimensional queries deals with merged
and high level of data as dimensions, levels,
hierarchies and attributes. Therefore the
solution to building an efficient
multidimensional data base is to associate all
logical attributes. The association is mainly
appreciated since typical dimensional are
hierarchical in nature. We found this
approach applicable and useful for our work.
4. DIMENSIONAL MODELING
Dimensional modeling having different view
to the data in data warehouse from different
dimension. This view may be used in a
Decision Support System as a part of
business intelligence in associate with data
mining errands. Decision support
applications often require that information
obtained along many dimension. For
example, if we want to list out the detail of
male-literacy rate of aged above-7 group of
a particular state. This query requires
dimensions as Literacy rate, Above-7
population of male of each state. Each
dimension is a collection of logically related
levels and then levels again having
hierarchies and attributes. From which
dimensions viewed as an axis for modeling
the data then levels for further partitioning.
Within each dimension, these entities form
levels, on which various questions may be
asked. The specific data stored is known as
facts and mainly numeric data. Facts consist
of context and measure data. The measures
are the numeric attributes of the facts that
are queried against census data cube.
Decision Support System queries may
access the facts from many different
Figure-1: Multidimensional cube with
different levels of each dimensions.
dimensions and levels and hierarchies. The
levels in each dimension enable the recovery
of facts from different dimensions. The data
warehouse should have summarized
collection of data attributes that makes
information retrieval more efficient. Facts
are also known as key performance indicator
in case of data warehousing. The aggregated
information of fact can be viewed as 0-D
level which is top most hierarchical level.
The same info can be viewed at 1-D level as
Demography or Study area or Sex
distribution and location. Further the
combination of two dimensions can be
analyzed the 2-D level and 3-D level as
combination of three dimensions. Any
combination of different dimensions are
possible for slicing and dicing of dimensions
with various axis while browsing the OLAP
cube which helps us base level analysis of
fact data for efficient decision making. The
developed cubes are mainly for Data
analysis. Data analysis is the process to
evaluate predictive information from large
databases to find out information about the
present status and predict future trends in the
concerned sector of economy. These can be
analyzed using various features of data
cubes such as slice, dice, drill-up and drill-
down etc. The problem of building the
multidimensional cubes can be represent in
figure-2.
Figure- 2: Various types of dimensional
views in the cube
The multidimensional structure of the cube
stored the pre-calculated aggregations. The
roll-up and drill-down operations of the cube
are designed with the data flow hierarchies.
A hierarchy describes the logical structure
and logical parent-child relationship within
the data. For example, the Districts are at the
lowest level in the Study-Area hierarchy.
The State is the next above level as shown in
figure-3.
Figure- 3: Hierarchical view of Study_Area
Dimension
5. MULTIDIMENSIONAL
SCHEMA FOR CENSUS DATA
WAREHOUSE
Dimensional models represent data with a
“cube” structure [5], making more
compatible logical data representation with
OLAP data analysis. The data can be
directly queried by passing complex
database queries by using various
combinations of dimensions.
Multidimensional models income advantage
of characteristic relationships in data to
populate data in multidimensional matrices
called data cubes. The multidimensional
cube is much better than relational data
model in case of query performance. The
response time of the multidimensional query
totally depends on how many levels are
added at each dimension[6]. There are two
types of schemas are followed in the process
of designing multidimensional cubes. If all
the dimensions are directly joined to the fact
table then this schema is said to be star
schema if the dimensions not directly joined
to the fact table but through the other
dimension, this is said to be snowflake
schema. Mainly snowflake schema is used
when the records in the fact table is less
compared to dimension table. This type of
situation doesn’t occur in our work so we
design all the data marts in star schema.
Figure-4: Star Schema
The various steps for designing a
multidimensional model include
 Identify fact table that contain all of
the dimensions on which the fact
data based.
 Identify the dimensions and
granularity of each fact table.
 Define measures which lead to
analyzing and reporting for each fact
table.
 Aggregating all the attributes, levels
and hierarchy for each dimension of
fact table.
6. OLAP FOR CENSUS DATA
WAREHOUSE
The database used for the implementation of
this paper is Oracle Database 11gR2
Enterprise Edition. The OLAP option is
installed automatically as part of a basic
installation of Oracle Database. Thereafter a
GUI tool is required for creating,
developing, and managing multidimensional
data in an Oracle data warehouse is Analytic
Workspace Manager 11.2.0.4.0B (AWM)
tool which is easy-to-use GUI tool, you can
create the vessel for OLAP data, an analytic
workspace (AW), and then add OLAP
dimensions and cubes. In Oracle OLAP, a
Cube provides a convenient way of
collecting stored and calculated measures
with similar characteristics, including
dimensionality and aggregation rules. We
can easily define more than one cube in
which each cube may describe a different
dimensional shape and multiple cubes in the
same AW can share one or more dimensions
to each other. Thus, a cube is simply a
logical object that helps an administrator to
build and maintain data in an AW. After
creating cubes, measures, and dimensions,
you map the dimensions and stored
measures to existing star, snowflake, and
normalized relational sources and then load
the data. OLAP data can then be queried
with Sql-developer.
In this census data warehouse, we firmly use
multi-dimensional cube to store the data.
Every cube is particularly its own value
because all these make possible to roll-up
and drill-down operations with other cubes..
There are several types of OLAP cube
operations that which support to response
more complex queries within efficiency and
timely manner.
Slice: The slice operation selects one
particular dimension from a given cube and
provides a new sub-cube. The SQL
statement may be given as.
SELECT ALL (Total_Male_Female) FROM
ALL_DEMOGRAPHY WHERE
Study_Area=’Punjab’;
Dice: This can be performed by slice on one
dimension and then rotating the cube to
select on second dimension. The SQL
statement may be given as.
SELECT Sex_Ratio FROM
ALL_DEMOGRAPHY WHERE
Study_Area=’Haryana’;
Roll-up: This allows asking queries that
moves up an aggregation hierarchy. Instead
of looking one fact we look at all the facts.
SELECT ALL(Demography) FROM
Census_Cube WHERE
Study_Area=’J.K’,’Punjab’,’Haryana’,’Gujr
at’,’Bihar’,’Himachal Pardesh’;
Drill-down: This operation allows user to
navigate lower in the aggregation hierarchy.
In this user get more specified results. Such
as in query.
SELECT Total FROM Sex_Ratio WHERE
Study_Area=’Haryana’;
7. CONCLUSION
In this paper we have applied concepts of
dimensional modeling to census data
warehouse. We have identified major census
attributes and place these attributes in the
correspondent dimensions of the cube.
Further we developed a star schema for
multidimensional data cube and identified
the primary keys to connect each of the
dimensions with central fact table of the star
schema. The fact table collects information
from each of the dimensions and then, the
values of decision coefficient is computed
and stored in the fact table, correspondent to
each cell of the cube. The real strength of
the paper lies in the adoption of predictive
data mining techniques about Our Census
Our Future (OCOF). Many data mining
tools and techniques may be applied to this
census database, which probably responds to
more complex queries. Similarly other
complex and specific queries can respond
using data mining. This paper strongly
advocates the use of predictive data mining
techniques to retrieve more sophisticated
information from census data warehouse.
We will find our work meaningful if an
agent based software could be developed for
the purpose of warehousing and mining of
census attributes.
References
[1] Schrader, M., Vlamis, D., Nader, M.,
Claterbos, C., Collins, D., Campbell, M., &
Conrad, F. Oracle Essbase & Oracle OLAP.
McGraw-Hill, Inc., 2009.
[2] Kimball, R., Reeves, L., Ross, M. and
Thornthwaite, W. The Data Warehouse
Lifecycle Toolkit: Expert Methods for
Designing, developing, and Deploying Data
Warehouses. John Wiley & Sons, 1998.
[3] M.A.R Kortnik, D.L. Moody, From
entities to stars, snowflakes, clusters;
constellations and galaxies: A
Methodology for data warehouse Design,
18th
International-conference on
conceptual modeling, industrial track
proceedings, ER'99.
[4] Paulraj Ponniah , Data warehousing
Fundamentals. J. Wiley & Sons, 2005.
[5] R.Kimball, The Data warehouse toolkit,
John Wiley & Sons 1996.
[6] W. H. Inmon. Building the Data
Warehouse. John Wiley & Sons, Inc., 1992.

On multi dimensional cubes of census data: designing and querying

  • 1.
    ON MULTI-DIMENSIONAL CUBESOF CENSUS DATA: DESIGNING AND QUERYING Jaspreet Singh1 , Dr. Neeraj Sharma2 , Dr. Kawaljeet Singh3 Research Scholar,Department of Computer Science, Punjabi University, Patiala,Punjab, INDIA1 Professor,Department of Computer Science, Punjabi University,Patiala,Punjab, INDIA2 Director, University Computer Centre Punjabi University, Patiala, Punjab, INDIA3 jaspreetissaj@gmail.com1 , sharma_neeraj@hotmail.com2 , singhkawaljeet@rediffmail.com3 Abstract The relational database model is perhaps some of the most commonly used model in database technology. By retaining all of its strengths, it doesn’t execute well with complex queries and analysis of summarized data. The concept of Online Analytical Processing (OLAP) has been developed to meet this challenge. Data cube is the main component of OLAP, which is a multidimensional database model which has developed for incredible speed-up of analyzing and processing large data sets with the usage of various techniques. The most flourishing concept in the computer industry as Business Intelligence (BI), is also dependent upon OLAP cubes to a major extent. The primary focus of this paper is to design a data warehouse that specifically targets OLAP storage, analyzing and querying requirements to the multidimensional cubes of census data within efficient timely manner. Keywords: - Business Intelligence, OLAP, Data Warehouse, Multidimensional Cubes, Census. --------------------------------------------------------***-------------------------------------------------------- 1. INTRODUCTION During the last few decades, advancements and mellows in the database technology has developed a term Business Intelligence (BI), with the aim of describing a set of technologies which mainly caters to the two main objectives i.e. (i) To provide ability to end user to retrieve valuable information from data collected by business organization from their routine operation and (ii) To support planned decision-making. The advent of business intelligence be a change in the expectations of the end user from a database management system. Business Intelligence means different vision to different eyes. For some people, it is only the data warehouse whereas for others as dashboards for instant information for their business strategies on their desktops screen [1].
  • 2.
    Business Intelligence includesthe following:  Data warehousing  OLAP - Online Analytical Processing  Predictive analytics and data mining  Business reporting, including dashboards All of the above technologies help an organization’s ability to create, analyze, and report precise information about the business which required for forward-facing activities. Data warehousing and Online Analytical Processing (OLAP) both of them are the most important components of contemporary Decision Support Systems (DSS). Mutually, they allow organizations to make effective decisions with the intense of both of their current and future state. Because information is vital resource for any planned operation. All of the organizations are constantly mounting and storing huge amounts of data related to their day to day operations. As today's markets are much more competitive than those in the past, this huge amount of organizational data should be processed and made available when required with more accuracy and with fastest available speed. The success or failure of the organizations totally dependent to their ability to analyze and synthesize data. So it becomes more important to process and manage this data. Moreover, these days decision making is dependent on availability of high- quality, integrated information within time variant. Therefore, an organization should be provided with this kind of information. 2. OBJECTIVES OF THIS WORK  Identify relevant data and present data as facts (Key Performance Indicators).  Design and implement a data warehouse and a data cube for the Census.  Present the data to the end user in form of Multidimensional model.  To find out each and every detail, from any dimension or by integration of one or more dimension.  Analysis and evaluation of results and query performance. 3. LITERATURE REVIEW In data mining, the literature for multidimensional cube of census data alongside their correspondent star and snowflake schema are used to design business oriented databases. Dimensional modeling is the most appropriate approach to design a data warehouse for the purpose of predictive data mining[2]. According to Kortnik the major objectives of dimensional modeling are: (i) to design database structures that are easy for end users to understand and write queries against, and (ii) to maximize the efficiency of queries [3]. One way to look at the multidimensional data model is to view as a cube, the limitation here is that, as the number of dimensions increased, this lead to the number of cube’s cells increases exponentially [4]. The majority of multidimensional queries deals with merged and high level of data as dimensions, levels, hierarchies and attributes. Therefore the solution to building an efficient multidimensional data base is to associate all logical attributes. The association is mainly appreciated since typical dimensional are
  • 3.
    hierarchical in nature.We found this approach applicable and useful for our work. 4. DIMENSIONAL MODELING Dimensional modeling having different view to the data in data warehouse from different dimension. This view may be used in a Decision Support System as a part of business intelligence in associate with data mining errands. Decision support applications often require that information obtained along many dimension. For example, if we want to list out the detail of male-literacy rate of aged above-7 group of a particular state. This query requires dimensions as Literacy rate, Above-7 population of male of each state. Each dimension is a collection of logically related levels and then levels again having hierarchies and attributes. From which dimensions viewed as an axis for modeling the data then levels for further partitioning. Within each dimension, these entities form levels, on which various questions may be asked. The specific data stored is known as facts and mainly numeric data. Facts consist of context and measure data. The measures are the numeric attributes of the facts that are queried against census data cube. Decision Support System queries may access the facts from many different Figure-1: Multidimensional cube with different levels of each dimensions. dimensions and levels and hierarchies. The levels in each dimension enable the recovery of facts from different dimensions. The data warehouse should have summarized collection of data attributes that makes information retrieval more efficient. Facts are also known as key performance indicator in case of data warehousing. The aggregated information of fact can be viewed as 0-D level which is top most hierarchical level. The same info can be viewed at 1-D level as Demography or Study area or Sex distribution and location. Further the combination of two dimensions can be analyzed the 2-D level and 3-D level as combination of three dimensions. Any combination of different dimensions are possible for slicing and dicing of dimensions with various axis while browsing the OLAP cube which helps us base level analysis of fact data for efficient decision making. The developed cubes are mainly for Data analysis. Data analysis is the process to evaluate predictive information from large databases to find out information about the present status and predict future trends in the concerned sector of economy. These can be analyzed using various features of data cubes such as slice, dice, drill-up and drill- down etc. The problem of building the multidimensional cubes can be represent in figure-2. Figure- 2: Various types of dimensional views in the cube
  • 4.
    The multidimensional structureof the cube stored the pre-calculated aggregations. The roll-up and drill-down operations of the cube are designed with the data flow hierarchies. A hierarchy describes the logical structure and logical parent-child relationship within the data. For example, the Districts are at the lowest level in the Study-Area hierarchy. The State is the next above level as shown in figure-3. Figure- 3: Hierarchical view of Study_Area Dimension 5. MULTIDIMENSIONAL SCHEMA FOR CENSUS DATA WAREHOUSE Dimensional models represent data with a “cube” structure [5], making more compatible logical data representation with OLAP data analysis. The data can be directly queried by passing complex database queries by using various combinations of dimensions. Multidimensional models income advantage of characteristic relationships in data to populate data in multidimensional matrices called data cubes. The multidimensional cube is much better than relational data model in case of query performance. The response time of the multidimensional query totally depends on how many levels are added at each dimension[6]. There are two types of schemas are followed in the process of designing multidimensional cubes. If all the dimensions are directly joined to the fact table then this schema is said to be star schema if the dimensions not directly joined to the fact table but through the other dimension, this is said to be snowflake schema. Mainly snowflake schema is used when the records in the fact table is less compared to dimension table. This type of situation doesn’t occur in our work so we design all the data marts in star schema. Figure-4: Star Schema The various steps for designing a multidimensional model include  Identify fact table that contain all of the dimensions on which the fact data based.  Identify the dimensions and granularity of each fact table.  Define measures which lead to analyzing and reporting for each fact table.  Aggregating all the attributes, levels and hierarchy for each dimension of fact table.
  • 5.
    6. OLAP FORCENSUS DATA WAREHOUSE The database used for the implementation of this paper is Oracle Database 11gR2 Enterprise Edition. The OLAP option is installed automatically as part of a basic installation of Oracle Database. Thereafter a GUI tool is required for creating, developing, and managing multidimensional data in an Oracle data warehouse is Analytic Workspace Manager 11.2.0.4.0B (AWM) tool which is easy-to-use GUI tool, you can create the vessel for OLAP data, an analytic workspace (AW), and then add OLAP dimensions and cubes. In Oracle OLAP, a Cube provides a convenient way of collecting stored and calculated measures with similar characteristics, including dimensionality and aggregation rules. We can easily define more than one cube in which each cube may describe a different dimensional shape and multiple cubes in the same AW can share one or more dimensions to each other. Thus, a cube is simply a logical object that helps an administrator to build and maintain data in an AW. After creating cubes, measures, and dimensions, you map the dimensions and stored measures to existing star, snowflake, and normalized relational sources and then load the data. OLAP data can then be queried with Sql-developer. In this census data warehouse, we firmly use multi-dimensional cube to store the data. Every cube is particularly its own value because all these make possible to roll-up and drill-down operations with other cubes.. There are several types of OLAP cube operations that which support to response more complex queries within efficiency and timely manner. Slice: The slice operation selects one particular dimension from a given cube and provides a new sub-cube. The SQL statement may be given as. SELECT ALL (Total_Male_Female) FROM ALL_DEMOGRAPHY WHERE Study_Area=’Punjab’; Dice: This can be performed by slice on one dimension and then rotating the cube to select on second dimension. The SQL statement may be given as. SELECT Sex_Ratio FROM ALL_DEMOGRAPHY WHERE Study_Area=’Haryana’; Roll-up: This allows asking queries that moves up an aggregation hierarchy. Instead of looking one fact we look at all the facts. SELECT ALL(Demography) FROM Census_Cube WHERE Study_Area=’J.K’,’Punjab’,’Haryana’,’Gujr at’,’Bihar’,’Himachal Pardesh’; Drill-down: This operation allows user to navigate lower in the aggregation hierarchy. In this user get more specified results. Such as in query. SELECT Total FROM Sex_Ratio WHERE Study_Area=’Haryana’;
  • 6.
    7. CONCLUSION In thispaper we have applied concepts of dimensional modeling to census data warehouse. We have identified major census attributes and place these attributes in the correspondent dimensions of the cube. Further we developed a star schema for multidimensional data cube and identified the primary keys to connect each of the dimensions with central fact table of the star schema. The fact table collects information from each of the dimensions and then, the values of decision coefficient is computed and stored in the fact table, correspondent to each cell of the cube. The real strength of the paper lies in the adoption of predictive data mining techniques about Our Census Our Future (OCOF). Many data mining tools and techniques may be applied to this census database, which probably responds to more complex queries. Similarly other complex and specific queries can respond using data mining. This paper strongly advocates the use of predictive data mining techniques to retrieve more sophisticated information from census data warehouse. We will find our work meaningful if an agent based software could be developed for the purpose of warehousing and mining of census attributes. References [1] Schrader, M., Vlamis, D., Nader, M., Claterbos, C., Collins, D., Campbell, M., & Conrad, F. Oracle Essbase & Oracle OLAP. McGraw-Hill, Inc., 2009. [2] Kimball, R., Reeves, L., Ross, M. and Thornthwaite, W. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, developing, and Deploying Data Warehouses. John Wiley & Sons, 1998. [3] M.A.R Kortnik, D.L. Moody, From entities to stars, snowflakes, clusters; constellations and galaxies: A Methodology for data warehouse Design, 18th International-conference on conceptual modeling, industrial track proceedings, ER'99. [4] Paulraj Ponniah , Data warehousing Fundamentals. J. Wiley & Sons, 2005. [5] R.Kimball, The Data warehouse toolkit, John Wiley & Sons 1996. [6] W. H. Inmon. Building the Data Warehouse. John Wiley & Sons, Inc., 1992.