On multi dimensional cubes of census data: designing and querying

ON MULTI-DIMENSIONAL CUBES OF CENSUS
DATA: DESIGNING AND QUERYING
Jaspreet Singh1
, Dr. Neeraj Sharma2
, Dr. Kawaljeet Singh3
Research Scholar,Department of Computer Science, Punjabi University, Patiala,Punjab, INDIA1
Professor,Department of Computer Science, Punjabi University,Patiala,Punjab, INDIA2
Director, University Computer Centre Punjabi University, Patiala, Punjab, INDIA3
jaspreetissaj@gmail.com1
, sharma_neeraj@hotmail.com2
, singhkawaljeet@rediffmail.com3
Abstract
The relational database model is perhaps some of the most commonly used model in database
technology. By retaining all of its strengths, it doesn’t execute well with complex queries and
analysis of summarized data. The concept of Online Analytical Processing (OLAP) has been
developed to meet this challenge. Data cube is the main component of OLAP, which is a
multidimensional database model which has developed for incredible speed-up of analyzing and
processing large data sets with the usage of various techniques. The most flourishing concept in
the computer industry as Business Intelligence (BI), is also dependent upon OLAP cubes to a
major extent. The primary focus of this paper is to design a data warehouse that specifically
targets OLAP storage, analyzing and querying requirements to the multidimensional cubes of
census data within efficient timely manner.
Keywords: - Business Intelligence, OLAP, Data Warehouse, Multidimensional Cubes, Census.
--------------------------------------------------------***--------------------------------------------------------
1. INTRODUCTION
During the last few decades, advancements
and mellows in the database technology has
developed a term Business Intelligence (BI),
with the aim of describing a set of
technologies which mainly caters to the two
main objectives i.e. (i) To provide ability to
end user to retrieve valuable information
from data collected by business organization
from their routine operation and (ii) To
support planned decision-making.
The advent of business intelligence be a
change in the expectations of the end user
from a database management system.
Business Intelligence means different vision
to different eyes. For some people, it is only
the data warehouse whereas for others as
dashboards for instant information for their
business strategies on their desktops screen
[1].

Business Intelligence includes the following:
 Data warehousing
 OLAP - Online Analytical
Processing
 Predictive analytics and data mining
 Business reporting, including
dashboards
All of the above technologies help an
organization’s ability to create, analyze, and
report precise information about the
business which required for forward-facing
activities.
Data warehousing and Online Analytical
Processing (OLAP) both of them are the
most important components of
contemporary Decision Support Systems
(DSS). Mutually, they allow organizations
to make effective decisions with the intense
of both of their current and future state.
Because information is vital resource for any
planned operation. All of the organizations
are constantly mounting and storing huge
amounts of data related to their day to day
operations. As today's markets are much
more competitive than those in the past, this
huge amount of organizational data should
be processed and made available when
required with more accuracy and with fastest
available speed. The success or failure of the
organizations totally dependent to their
ability to analyze and synthesize data. So it
becomes more important to process and
manage this data. Moreover, these days
decision making is dependent on availability
of high- quality, integrated information
within time variant. Therefore, an
organization should be provided with this
kind of information.
2. OBJECTIVES OF THIS WORK
 Identify relevant data and present
data as facts (Key Performance
Indicators).
 Design and implement a data
warehouse and a data cube for
the Census.
 Present the data to the end user in
form of Multidimensional model.
 To find out each and every detail,
from any dimension or by
integration of one or more
dimension.
 Analysis and evaluation of
results and query performance.
3. LITERATURE REVIEW
In data mining, the literature for
multidimensional cube of census data
alongside their correspondent star and
snowflake schema are used to design
business oriented databases. Dimensional
modeling is the most appropriate approach
to design a data warehouse for the purpose
of predictive data mining[2]. According to
Kortnik the major objectives of dimensional
modeling are: (i) to design database
structures that are easy for end users to
understand and write queries against, and
(ii) to maximize the efficiency of queries
[3]. One way to look at the multidimensional
data model is to view as a cube, the
limitation here is that, as the number of
dimensions increased, this lead to the
number of cube’s cells increases
exponentially [4]. The majority of
multidimensional queries deals with merged
and high level of data as dimensions, levels,
hierarchies and attributes. Therefore the
solution to building an efficient
multidimensional data base is to associate all
logical attributes. The association is mainly
appreciated since typical dimensional are

hierarchical in nature. We found this
approach applicable and useful for our work.
4. DIMENSIONAL MODELING
Dimensional modeling having different view
to the data in data warehouse from different
dimension. This view may be used in a
Decision Support System as a part of
business intelligence in associate with data
mining errands. Decision support
applications often require that information
obtained along many dimension. For
example, if we want to list out the detail of
male-literacy rate of aged above-7 group of
a particular state. This query requires
dimensions as Literacy rate, Above-7
population of male of each state. Each
dimension is a collection of logically related
levels and then levels again having
hierarchies and attributes. From which
dimensions viewed as an axis for modeling
the data then levels for further partitioning.
Within each dimension, these entities form
levels, on which various questions may be
asked. The specific data stored is known as
facts and mainly numeric data. Facts consist
of context and measure data. The measures
are the numeric attributes of the facts that
are queried against census data cube.
Decision Support System queries may
access the facts from many different
Figure-1: Multidimensional cube with
different levels of each dimensions.
dimensions and levels and hierarchies. The
levels in each dimension enable the recovery
of facts from different dimensions. The data
warehouse should have summarized
collection of data attributes that makes
information retrieval more efficient. Facts
are also known as key performance indicator
in case of data warehousing. The aggregated
information of fact can be viewed as 0-D
level which is top most hierarchical level.
The same info can be viewed at 1-D level as
Demography or Study area or Sex
distribution and location. Further the
combination of two dimensions can be
analyzed the 2-D level and 3-D level as
combination of three dimensions. Any
combination of different dimensions are
possible for slicing and dicing of dimensions
with various axis while browsing the OLAP
cube which helps us base level analysis of
fact data for efficient decision making. The
developed cubes are mainly for Data
analysis. Data analysis is the process to
evaluate predictive information from large
databases to find out information about the
present status and predict future trends in the
concerned sector of economy. These can be
analyzed using various features of data
cubes such as slice, dice, drill-up and drill-
down etc. The problem of building the
multidimensional cubes can be represent in
figure-2.
Figure- 2: Various types of dimensional
views in the cube

The multidimensional structure of the cube
stored the pre-calculated aggregations. The
roll-up and drill-down operations of the cube
are designed with the data flow hierarchies.
A hierarchy describes the logical structure
and logical parent-child relationship within
the data. For example, the Districts are at the
lowest level in the Study-Area hierarchy.
The State is the next above level as shown in
figure-3.
Figure- 3: Hierarchical view of Study_Area
Dimension
5. MULTIDIMENSIONAL
SCHEMA FOR CENSUS DATA
WAREHOUSE
Dimensional models represent data with a
“cube” structure [5], making more
compatible logical data representation with
OLAP data analysis. The data can be
directly queried by passing complex
database queries by using various
combinations of dimensions.
Multidimensional models income advantage
of characteristic relationships in data to
populate data in multidimensional matrices
called data cubes. The multidimensional
cube is much better than relational data
model in case of query performance. The
response time of the multidimensional query
totally depends on how many levels are
added at each dimension[6]. There are two
types of schemas are followed in the process
of designing multidimensional cubes. If all
the dimensions are directly joined to the fact
table then this schema is said to be star
schema if the dimensions not directly joined
to the fact table but through the other
dimension, this is said to be snowflake
schema. Mainly snowflake schema is used
when the records in the fact table is less
compared to dimension table. This type of
situation doesn’t occur in our work so we
design all the data marts in star schema.
Figure-4: Star Schema
The various steps for designing a
multidimensional model include
 Identify fact table that contain all of
the dimensions on which the fact
data based.
 Identify the dimensions and
granularity of each fact table.
 Define measures which lead to
analyzing and reporting for each fact
table.
 Aggregating all the attributes, levels
and hierarchy for each dimension of
fact table.

6. OLAP FOR CENSUS DATA
WAREHOUSE
The database used for the implementation of
this paper is Oracle Database 11gR2
Enterprise Edition. The OLAP option is
installed automatically as part of a basic
installation of Oracle Database. Thereafter a
GUI tool is required for creating,
developing, and managing multidimensional
data in an Oracle data warehouse is Analytic
Workspace Manager 11.2.0.4.0B (AWM)
tool which is easy-to-use GUI tool, you can
create the vessel for OLAP data, an analytic
workspace (AW), and then add OLAP
dimensions and cubes. In Oracle OLAP, a
Cube provides a convenient way of
collecting stored and calculated measures
with similar characteristics, including
dimensionality and aggregation rules. We
can easily define more than one cube in
which each cube may describe a different
dimensional shape and multiple cubes in the
same AW can share one or more dimensions
to each other. Thus, a cube is simply a
logical object that helps an administrator to
build and maintain data in an AW. After
creating cubes, measures, and dimensions,
you map the dimensions and stored
measures to existing star, snowflake, and
normalized relational sources and then load
the data. OLAP data can then be queried
with Sql-developer.
In this census data warehouse, we firmly use
multi-dimensional cube to store the data.
Every cube is particularly its own value
because all these make possible to roll-up
and drill-down operations with other cubes..
There are several types of OLAP cube
operations that which support to response
more complex queries within efficiency and
timely manner.
Slice: The slice operation selects one
particular dimension from a given cube and
provides a new sub-cube. The SQL
statement may be given as.
SELECT ALL (Total_Male_Female) FROM
ALL_DEMOGRAPHY WHERE
Study_Area=’Punjab’;
Dice: This can be performed by slice on one
dimension and then rotating the cube to
select on second dimension. The SQL
statement may be given as.
SELECT Sex_Ratio FROM
ALL_DEMOGRAPHY WHERE
Study_Area=’Haryana’;
Roll-up: This allows asking queries that
moves up an aggregation hierarchy. Instead
of looking one fact we look at all the facts.
SELECT ALL(Demography) FROM
Census_Cube WHERE
Study_Area=’J.K’,’Punjab’,’Haryana’,’Gujr
at’,’Bihar’,’Himachal Pardesh’;
Drill-down: This operation allows user to
navigate lower in the aggregation hierarchy.
In this user get more specified results. Such
as in query.
SELECT Total FROM Sex_Ratio WHERE
Study_Area=’Haryana’;

7. CONCLUSION
In this paper we have applied concepts of
dimensional modeling to census data
warehouse. We have identified major census
attributes and place these attributes in the
correspondent dimensions of the cube.
Further we developed a star schema for
multidimensional data cube and identified
the primary keys to connect each of the
dimensions with central fact table of the star
schema. The fact table collects information
from each of the dimensions and then, the
values of decision coefficient is computed
and stored in the fact table, correspondent to
each cell of the cube. The real strength of
the paper lies in the adoption of predictive
data mining techniques about Our Census
Our Future (OCOF). Many data mining
tools and techniques may be applied to this
census database, which probably responds to
more complex queries. Similarly other
complex and specific queries can respond
using data mining. This paper strongly
advocates the use of predictive data mining
techniques to retrieve more sophisticated
information from census data warehouse.
We will find our work meaningful if an
agent based software could be developed for
the purpose of warehousing and mining of
census attributes.
References
[1] Schrader, M., Vlamis, D., Nader, M.,
Claterbos, C., Collins, D., Campbell, M., &
Conrad, F. Oracle Essbase & Oracle OLAP.
McGraw-Hill, Inc., 2009.
[2] Kimball, R., Reeves, L., Ross, M. and
Thornthwaite, W. The Data Warehouse
Lifecycle Toolkit: Expert Methods for
Designing, developing, and Deploying Data
Warehouses. John Wiley & Sons, 1998.
[3] M.A.R Kortnik, D.L. Moody, From
entities to stars, snowflakes, clusters;
constellations and galaxies: A
Methodology for data warehouse Design,
18th
International-conference on
conceptual modeling, industrial track
proceedings, ER'99.
[4] Paulraj Ponniah , Data warehousing
Fundamentals. J. Wiley & Sons, 2005.
[5] R.Kimball, The Data warehouse toolkit,
John Wiley & Sons 1996.
[6] W. H. Inmon. Building the Data
Warehouse. John Wiley & Sons, Inc., 1992.

On multi dimensional cubes of census data: designing and querying

More Related Content

What's hot

Similar to On multi dimensional cubes of census data: designing and querying

Recently uploaded

On multi dimensional cubes of census data: designing and querying