2. Data Mining and Warehousing
Mr. Sagar Pandya
Information Technology Department
sagar.pandya@medicaps.ac.in
Course Code Course Name Hours Per Week Total
Credits
L T P
IT3ED02 Data Mining and Warehousing 3 0 0 3
3. IT3ED02 Data Mining and Warehousing 3-0-0
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Unit 1. Introduction
Unit 2. Data Mining
Unit 3. Association and Classification
Unit 4. Clustering
Unit 5. Business Analysis
4. Reference Books
Text Books
Han, Kamber and Pi, Data Mining Concepts & Techniques, Morgan Kaufmann,
India, 2012.
Mohammed Zaki and Wagner Meira Jr., Data Mining and Analysis:
Fundamental Concepts and Algorithms, Cambridge University Press.
Z. Markov, Daniel T. Larose Data Mining the Web, Jhon wiley & son, USA.
Reference Books
Sam Anahory and Dennis Murray, Data Warehousing in the Real World,
Pearson Education Asia.
W. H. Inmon, Building the Data Warehouse, 4th Ed Wiley India.
and many others
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
5. Unit-5 Business Analysis
Reporting and Query Tools and Application-
Tool Categories-
Need for Applications-SAS,KNIME, ORANGE, ETL,
Data Quality,
OLAP,
Dimensional Modelling, Multidimensional Model,
Multidimensional vs Multirelational
OLAP, OLAP Tools
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
6. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
• A querying and reporting tool helps you run regular reports, create
organized listings, and perform cross-tabular reporting and querying.
Query Tools
One of the primary objects of data warehousing is to provide
information to businesses to make strategic decisions.
Query tools allow users to interact with the data warehouse system.
These tools fall into four different categories:
1. Query and reporting tools
2. Application Development tools
3. Data mining tools
4. OLAP tools
7. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
1. Query and reporting tools:
Query and reporting tools can be further divided into
• Reporting tools
• Managed query tools
Reporting tools:
Reporting tools can be further divided into production reporting tools
and desktop report writer.
1. Report writers: This kind of reporting tool are tools designed for end-
users for their analysis.
2. Production reporting: This kind of tools allows organizations to
generate regular operational reports.
3. It also supports high volume batch jobs like printing and calculating.
8. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Some popular reporting tools are Brio, Business Objects, Oracle,
PowerSoft, SAS Institute.
Managed query tools:
Managed query tools shield end user from the Complexities of SQL
and database structure by inserting a meta layer between user and the
database.
Meta layer :Software that provides subject oriented views of a
database and supports point and click creation of SQL
2. Application development tools:
First deployed on main frame system.
Sometimes built-in graphical and analytical tools do not satisfy the
analytical needs of an organization.
9. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
In such cases, custom reports are developed using Application
development tools.
3. Data mining tools:
Data mining is a process of discovering meaningful new correlation,
pattens, and trends by mining large amount data. Data mining
tools are used to make this process automatic.
4. OLAP tools:
These tools are based on concepts of a multidimensional database.
It allows users to analyze the data using elaborate and complex
multidimensional views.
Provide an intuitive way to view corporate data
Users can drill down across, or up levels
10. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
• A querying and reporting tool helps you run regular reports, create
organized listings, and perform cross-tabular reporting and querying.
Basic query and reporting “Tell me what happened.”
Business analysis (OLAP) “Tell me what happened
and why.”
Data mining “Tell me what may happen”
or “Tell me something
interesting.”
Dashboards and
scorecards
“Tell me how I’m doing
currently and against my
plan.”
11. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Need For Applications:
Some tools and apps can format the retrieved data into easy-to-read
reports, while others concentrate on the on-screen presentation As the
complexity of questions grows this tools may rapidly become Inefficient.
Consider various access types to the data stored in a data warehouse
Simple tabular form reporting
Ad hoc user specified queries
Predefined repeatable queries
Complex queries with multi table joins ,multilevel subqueries, and
sophisticated search criteria.
Ranking
Multivariable analysis
12. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Time series analysis, Complex textual search
Data visualization, graphing, charting and pivoting
Interactive Drill down reporting an analysis
AI techniques for testing of hypothesis
Information Mapping, Statistical analysis
Interactive drill-down reporting and analysis
The first four types of access are covered by the combine category of tools
we will call query and reporting tools.
There are three types of reporting
Creation and viewing of standard reports
Definition and creation of ad hoc reports
Data exploration
13. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Cognos Impromptu
What is impromptu?
Impromptu is an interactive database reporting tool.
It allows Power Users to query data without programming
knowledge.
It is only capable of reading the data.
Impromptu’s main features includes,
Interactive reporting capability
Enterprise-wide scalability
Superior user interface
Fastest time to result
Lowest cost of ownership
14. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
IBM Cognos Business Intelligence is a web based reporting and
analytic tool.
It is used to perform data aggregation and create user friendly
detailed reports.
Reports can contain Graphs, Multiple Pages, Different Tabs and
Interactive Prompts.
These reports can be viewed on web browsers, or on hand held
devices like tablets and smartphones.
Cognos also provides you an option to export the report in XML or
PDF format or you can view the reports in XML format.
You can also schedule the report to run in the background at specific
time period so it saves the time to view the daily report as you don’t
need to run the report every time.
15. Reporting and Query Tools and Applications
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
IBM Cognos provides a wide range of features and can be
considered as an enterprise software to provide flexible reporting
environment and can be used for large and medium enterprises.
It meets the need of Power Users, Analysts, Business Managers and
Company Executives.
Power users and analysts want to create adhoc reports and can create
multiple views of the same data.
Business Executives want to see summarize data in dashboard styles,
cross tabs and visualizations.
Cognos BI reporting allows you to bring the data from multiple
databases into a single set of reports.
IBM Cognos provides wide range of features as compared to other
BI tools in the market.
16. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
OLAP stands for Online Analytical Processing.
OLAP (Online Analytical Processing) is the technology behind
many Business Intelligence (BI) applications.
OLAP is a powerful technology for data discovery, including
capabilities for limitless report viewing, complex analytical
calculations, and predictive “what if” scenario (budget, forecast)
planning.
It uses database tables (fact and dimension tables) to enable
multidimensional viewing, analysis and querying of large amounts of
data.
E.g. OLAP technology could provide management with fast answers
to complex queries on their operational data or enable them to
analyze their company’s historical data for trends and patterns.
17. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Online Analytical Processing (OLAP) applications and tools are
those that are designed to ask ―complex queries of large
multidimensional collections of data.
How is OLAP Technology Used?
OLAP is an acronym for Online Analytical Processing. OLAP
performs multidimensional analysis of business data and provides the
capability for complex calculations, trend analysis, and sophisticated
data modeling.
It is the foundation for many kinds of business applications for
Business Performance Management, Planning, Budgeting,
Forecasting, Financial Reporting, Analysis, Simulation Models,
Knowledge Discovery, and Data Warehouse Reporting.
18. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Unlike relational databases, OLAP tools do not store individual
transaction records in two-dimensional, row-by-column format, like
a worksheet, but instead use multidimensional database structures—
known as Cubes in OLAP terminology—to store arrays of
consolidated information.
The data and formulas are stored in an optimized multidimensional
database, while views of the data are created on demand.
OLAP technology implementations depend not only on the type of
software, but also on underlying data sources and the intended
business objective(s).
Each industry or business area is specific and requires some degree
of customized modeling to create multidimensional “cubes” for data
loading and reporting building, at minimum.
19. OLAP CUBE
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
At the core of the OLAP concept, is an OLAP Cube. The OLAP cube
is a data structure optimized for very quick data analysis.
The OLAP Cube consists of numeric facts called measures which are
categorized by dimensions. OLAP Cube is also called the hypercube.
20. OLAP CUBE
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Usually, data operations and analysis are performed using the simple
spreadsheet, where data values are arranged in row and column
format. This is ideal for two-dimensional data.
However, OLAP contains multidimensional data, with data usually
obtained from a different and unrelated source.
Using a spreadsheet is not an optimal option.
The cube can store and analyze multidimensional data in a logical
and orderly manner.
OLAP cubes have two main purposes. The first is to provide
business users with a data model more intuitive to them than a
tabular model. This model is called a Dimensional Model.
The second purpose is to enable fast query response that is usually
difficult to achieve using tabular models.
21. OLAP CUBE
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
How does it work?
A Data warehouse would extract information from multiple data
sources and formats like text files, excel sheet, multimedia files, etc.
The extracted data is cleaned and transformed. Data is loaded into an
OLAP server (or OLAP cube) where information is pre-calculated in
advance for further analysis.
Fundamentally, OLAP has a very simple concept.
It pre-calculates most of the queries that are typically very hard to
execute over tabular databases, namely aggregation, joining, and
grouping.
These queries are calculated during a process that is usually called
'building' or 'processing' of the OLAP cube.
22. Basic analytical operations of OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Four types of analytical operations in OLAP are:
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
1) Roll-up:
Roll-up is also known as "consolidation" or "aggregation."
The Roll-up operation can be performed in 2 ways:
Reducing dimensions
Climbing up concept hierarchy. Concept hierarchy is a system of
grouping things based on their order or level.
Consider the following diagram
24. Basic analytical operations of OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
In this example, cities New jersey and Lost Angles and rolled up into
country USA
The sales figure of New Jersey and Los Angeles are 440 and 1560
respectively. They become 2000 after roll-up
In this aggregation process, data is location hierarchy moves up from
city to the country.
In the roll-up process at least one or more dimensions need to be
removed. In this example, Quarter dimension is removed.
2) Drill-down
In drill-down data is fragmented into smaller parts. It is the opposite
of the rollup process. It can be done via, Moving down the concept
hierarchy or Increasing a dimension.
26. Basic analytical operations of OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Quarter Q1 is drilled down to months January, February, and March.
Corresponding sales are also registers.
In this example, dimension months are added.
3) Slice:
Here, one dimension is selected, and a new sub-cube is created.
Dimension Time is Sliced with Q1 as the filter.
A new cube is created altogether.
28. Basic analytical operations of OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Dice:
This operation is similar to a slice. The difference in dice is you
select 2 or more dimensions that result in the creation of a sub-cube.
30. Basic analytical operations of OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
4) Pivot
In Pivot, you rotate the data axes to provide a substitute presentation
of data.
In the following example, the pivot is based on item types.
32. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
The main characteristics of OLAP are as follows:
Multidimensional conceptual view: OLAP systems let business
users have a dimensional and logical view of the data in the data
warehouse. It helps in carrying slice and dice operations.
Multi-User Support: Since the OLAP techniques are shared, the
OLAP operation should provide normal database operations,
containing retrieval, update, adequacy control, integrity, and security.
Accessibility: OLAP acts as a mediator between data warehouses
and front-end. The OLAP operations should be sitting between data
sources (e.g., data warehouses) and an OLAP front-end.
Storing OLAP results: OLAP results are kept separate from data
sources.
33. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
OLAP provides for distinguishing between zero values and missing
values so that aggregates are computed correctly.
OLAP system should ignore all missing values and compute correct
aggregate values.
OLAP facilitate interactive query and complex analysis for the users.
OLAP allows users to drill down for greater details or roll up for
aggregations of metrics along a single business dimension or across
multiple dimension.
OLAP provides the ability to perform intricate calculations and
comparisons.
OLAP presents results in a number of meaningful ways, including
charts and graphs.
34. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Advantages of OLAP
OLAP is a platform for all type of business includes planning,
budgeting, reporting, and analysis.
Information and calculations are consistent in an OLAP cube. This is
a crucial benefit.
Quickly create and analyze "What if" scenarios
Easily search OLAP database for broad or specific terms.
OLAP provides the building blocks for business modeling tools,
Data mining tools, performance reporting tools.
Allows users to do slice and dice cube data all by various
dimensions, measures, and filters.
35. OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
It is good for analyzing time series.
Finding some clusters and outliers is easy with OLAP.
It is a powerful visualization online analytical process system which
provides faster response times.
Disadvantages of OLAP
OLAP requires organizing data into a star or snowflake schema.
These schemas are complicated to implement and administer.
You cannot have large number of dimensions in a single OLAP cube.
Transactional data cannot be accessed with OLAP system.
Any modification in an OLAP cube needs a full update of the cube.
This is a time-consuming process
37. ROLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
ROLAP works with data that exist in a relational database.
Facts and dimension tables are stored as relational tables.
It also allows multidimensional analysis of data and is the fastest
growing OLAP.
This methodology relies on manipulating the data stored in the
relational database to give the appearance of traditional OLAP’s
slicing and dicing functionality.
In essence, each action of slicing and dicing is equivalent to adding a
―WHERE clause in the SQL statement. Data stored in relational
tables.
Relational OLAP servers are placed between relational back-end
server and client front-end tools.
38. ROLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
To store and manage the warehouse data, the relational OLAP uses
relational or extended-relational DBMS.
ROLAP includes the following −
• Implementation of aggregation navigation logic
• Optimization for each DBMS back-end
• Additional tools and services
Points to Remember
• ROLAP servers are highly scalable.
• ROLAP tools analyze large volumes of data across multiple
dimensions.
• ROLAP tools store and analyze highly volatile and changeable data.
39. ROLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Relational OLAP Architecture
ROLAP Architecture includes the following components
• Database server.
• ROLAP server.
• Front-end tool.
Relational OLAP (ROLAP) is the latest and fastest-growing OLAP
technology segment in the market.
This method allows multiple multidimensional views of two-
dimensional relational tables to be created, avoiding structuring
record around the desired view.
Some products in this segment have supported reliable SQL engines
to help the complexity of multidimensional analysis.
41. ROLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Advantages of ROLAP model:
1. High data efficiency. It offers high data efficiency because query performance
and access language are optimized particularly for the multidimensional data
analysis.
2. Scalability. This type of OLAP system offers scalability for managing large
volumes of data, and even when the data is steadily increasing.
Drawbacks of ROLAP model:
1. Demand for higher resources: ROLAP needs high utilization of manpower,
software, and hardware resources.
2. Aggregately data limitations. ROLAP tools use SQL for all calculation of
aggregate data. However, there are no set limits to the for handling computations.
3. Slow query performance. Query performance in this model is slow when
compared with MOLAP
42. MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines to
display multidimensional views of data. Basically, they use an OLAP
cube.
This is the more traditional way of OLAP analysis.
In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in proprietary
formats. That is, data stored in array-based structures.
MOLAP is used for limited data volumes and in this data is stored in
multidimensional array.
In MOLAP, Dynamic multidimensional view of data is created.
43. MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Points to Remember −
In MOLAP, operations are called processing.
• MOLAP tools process information with consistent response time
regardless of level of summarizing or calculations selected.
• MOLAP tools need to avoid many of the complexities of creating a
relational database to store data for analysis.
• MOLAP tools need fastest possible performance.
• MOLAP server adopts two level of storage representation to handle
dense and sparse data sets.
• Denser sub-cubes are identified and stored as array structure.
• Sparse sub-cubes employ compression technology.
45. MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
MOLAP Architecture
MOLAP Architecture includes the following components
• Database server.
• MOLAP server.
• Front-end tool.
MOLAP structure primarily reads the precompiled data. MOLAP
structure has limited capabilities to dynamically create aggregations
or to evaluate results which have not been pre-calculated and stored.
The main difference between ROLAP and MOLAP is that, In
ROLAP, Data is fetched from data-warehouse. On the other hand, in
MOLAP, Data is fetched from MDDBs database. The common term
between these two is OLAP.
46. MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Advantages:
Excellent performance: MOLAP cubes are built for fast data
retrieval, and are optimal for slicing and dicing operations.
Can perform complex calculations: All calculations have been pre-
generated when the cube is created. Hence, complex calculations are
not only doable, but they return quickly.
Disadvantages:
Limited in the amount of data it can handle: Because all calculations
are performed when the cube is built, it is not possible to include a
large amount of data in the cube itself.
This is not to say that the data in the cube cannot be derived from a
large amount of data. Indeed, this is possible. But in this case, only
summary-level information will be included in the cube itself.
47. MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Requires additional investment: Cube technology are often
proprietary and do not already exist in the organization.
Therefore, to adopt MOLAP technology, chances are additional
investments in human and capital resources are needed.
MOLAP are not capable of containing detailed data.
The storage utilization may be low if the data set is sparse.
MOLAP Solutions may be lengthy, particularly on large data
volumes.
MOLAP products may face issues while updating and querying
models when dimensions are more than ten.
MOLAP is not capable of containing detailed data.
The storage utilization can be low if the data set is highly scattered.
48. MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
MOLAP Tools
• Essbase - Tools from Oracle that has a multidimensional database.
• Express Server - Web-based environment that runs on Oracle
database.
• Yellowfin - Business analytics tools for creating reports and
dashboards.
• Clear Analytics - Clear analytics is an Excel-based business solution.
• SAP Business Intelligence - Business analytics solutions from SAP
49. ROLAP vs MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
S. No. ROLAP MOLAP
1.
ROLAP stands
for Relational Online
Analytical Processing.
While MOLAP stands
for Multidimensional Online
Analytical Processing.
2.
ROLAP is used for large
data volumes.
While it is used for limited data
volumes.
3.
The access of ROLAP is
slow.
While the access of MOLAP is
fast.
4.
In ROLAP, Data is stored in
relation tables.
While in MOLAP, Data is
stored in multidimensional
array.
50. ROLAP vs MOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
S. No. ROLAP MOLAP
5.
In ROLAP, Data is fetched
from data-warehouse.
While in MOLAP, Data is
fetched from MDDBs
database.
6.
In ROLAP, Complicated
SQL queries are used.
While in MOLAP, Sparse
matrix is used.
7.
In ROLAP, Static
multidimensional view of
data is created.
While in MOLAP, Dynamic
multidimensional view of data
is created.
8.
MOLAP is best suited for
inexperienced users, since it
is very easy to use.
ROLAP is best suited for
experienced users.
51. HOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Hybrid OLAP is a mixture of both ROLAP and MOLAP.
It offers fast computation of MOLAP and higher scalability of
ROLAP. HOLAP uses two databases.
1. Aggregated or computed data is stored in a multidimensional OLAP
cube
2. Detailed information is stored in a relational database.
HOLAP technologies attempt to combine the advantages of MOLAP
and ROLAP. For summary type information, HOLAP leverages cube
technology for faster performance.
HOLAP also can drill through from the cube down to the relational
tables for delineated data.
The Microsoft SQL Server 2000 provides a hybrid OLAP server.
52. HOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Benefits of Hybrid OLAP:
HOLAP provide benefits of both MOLAP and ROLAP.
This kind of OLAP helps to economize the disk space, and it also
remains compact which helps to avoid issues related to access speed
and convenience.
Hybrid HOLAP's uses cube technology which allows faster
performance for all types of data.
ROLAP are instantly updated and HOLAP users have access to this
real-time instantly updated data.
MOLAP brings cleaning and conversion of data thereby improving
data relevance. This brings best of both worlds.
53. HOLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Drawbacks of Hybrid OLAP:
Greater complexity level: The major drawback in HOLAP systems is
that it supports both ROLAP and MOLAP tools and applications.
Thus, it is very complicated.
Potential overlaps: There are higher chances of overlapping
especially into their functionalities.
54. Other types of OLAP
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Desktop OLAP (DOLAP):
In Desktop OLAP, a user downloads a part of the data from the database locally,
or on their desktop and analyze it. DOLAP is relatively cheaper to deploy as it
offers very few functionalities compares to other OLAP systems.
Web OLAP (WOLAP):
Web OLAP which is OLAP system accessible via the web browser. WOLAP is
a three-tiered architecture. It consists of three components: client, middleware,
and a database server.
Mobile OLAP (MOLAP): Mobile OLAP helps users to access and analyze
OLAP data using their mobile devices.
Spatial OLAP (SOLAP): SOLAP is created to facilitate management of both
spatial and non-spatial data in a Geographic Information system (GIS)
55. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Dimensional modeling represents data with a cube operation, making
more suitable logical data representation with OLAP data management.
The perception of Dimensional Modeling was developed by Ralph
Kimball and is consist of "fact" and "dimension" tables.
In dimensional modeling, the transaction record is divided into either
"facts," which are frequently numerical transaction data, or
"dimensions," which are the reference information that gives context to
the facts.
For example, a sale transaction can be damage into facts such as the
number of products ordered and the price paid for the products, and
into dimensions such as order date, user name, product number, order
ship-to, and bill-to locations, and salesman responsible for receiving
the order.
56. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Dimensional Modeling (DM) is a data structure technique optimized
for data storage in a Data warehouse.
The advantage of using this model is that we can store data in such a
way that it is easier to store and retrieve the data once stored in a data
warehouse.
Dimensional model is the data model used by many OLAP systems.
Objectives of Dimensional Modeling
The purposes of dimensional modeling are:
1. To produce database architecture that is easy for end-clients to
understand and write queries.
2. To maximize the efficiency of queries. It achieves these goals by
minimizing the number of tables and relationships between them.
57. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Elements of Dimensional Modeling:
Fact
It is a collection of associated data items, consisting of measures and
context data. It typically represents business items or business
transactions.
Dimensions
It is a collection of data which describe one business dimension.
Dimensions decide the contextual background for the facts, and they
are the framework over which OLAP is performed.
Measure
It is a numeric attribute of a fact, representing the performance or
behavior of the business relative to the dimensions.
59. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Steps to Create Dimensional Data Modeling:
Step-1: Identifying the business objective –
The first step is to identify the business objective. Sales, HR,
Marketing, etc. are some examples as per the need of the
organization.
Since it is the most important step of Data Modelling the selection of
business objective also depends on the quality of data available for
that process.
Step-2: Identifying Granularity –
Granularity is the lowest level of information stored in the table. The
level of detail for business problem and its solution is described by
Grain.
60. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
During this stage, you answer questions like
Do we need to store all the available products or just a few types of
products? This decision is based on the business processes selected
for Data warehouse.
Do we store the product sale information on a monthly, weekly, daily
or hourly basis? This decision depends on the nature of reports
requested by executives.
How do the above two choices affect the database size?
Example of Grain:
The CEO at an MNC wants to find the sales for specific products in
different locations on a daily basis.
So, the grain is "product sale information by location by the day."
61. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Step-3: Identifying Dimensions and its Attributes –
Dimensions are objects or things.
Dimensions are nouns like date, store, inventory, etc. These
dimensions are where all the data should be stored.
For example, the date dimension may contain data like a year, month
and weekday.
Dimensions categorize and describe data warehouse facts and
measures in a way that support meaningful answers to business
questions.
A data warehouse organizes descriptive attributes as columns in
dimension tables. For Example, the data dimension may contain data
like a year, month and weekday.
62. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Example of Dimensions:
The CEO at an MNC wants to find the sales for specific products in
different locations on a daily basis.
Dimensions: Product, Location and Time
Attributes: For Product: Product key (Foreign Key), Name, Type,
Specifications
Hierarchies: For Location: Country, State, City, Street Address,
Name
Step-4: Identifying the Fact –
The measurable data is hold by the fact table. Most of the fact table
rows are numerical values like price or cost per unit, etc.
63. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Example of Facts:
The CEO at an MNC wants to find the sales for specific products in
different locations on a daily basis.
The fact here is Sum of Sales by product by location by time.
Step 5) Build Schema
In this step, you implement the Dimension Model. A schema is
nothing but the database structure (arrangement of tables). There are
two popular schemas:
Star Schema
The star schema architecture is easy to design. It is called a star
schema because diagram resembles a star, with points radiating from
a center.
64. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
The center of the star consists of the fact table, and the points of the
star is dimension tables.
The fact tables in a star schema which is third normal form whereas
dimensional tables are de-normalized.
Snowflake Schema
The snowflake schema is an extension of the star schema. In a
snowflake schema, each dimension are normalized and connected to
more dimension tables.
65. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Rules for Dimensional Modelling: Following are the rules and
principles of Dimensional Modeling:
• Load atomic data into dimensional structures.
• Build dimensional models around business processes.
• Ensure that all facts in a single fact table are at the same grain or
level of detail.
• It's essential to store report labels and filter domain values in
dimension tables
• Need to ensure that dimension tables use a surrogate key
• Need to ensure that every fact table has associated dimension table.
• Continuously balance requirements and realities to deliver business
solution to support their decision-making
66. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Benefits of Dimensional Modeling
Standardization of dimensions allows easy reporting across areas of the
business.
Dimension tables store the history of the dimensional information.
It allows to introduce entirely new dimension without major disruptions to
the fact table.
Dimensional also to store data in such a fashion that it is easier to retrieve
the information from the data once the data is stored in the database.
Compared to the normalized model dimensional table are easier to
understand.
Information is grouped into clear and simple business categories.
The dimensional model also helps to boost query performance.
67. Dimensional Modelling
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
The dimensional model is very understandable by the business. This model is
based on business terms, so that the business knows what each fact, dimension,
or attribute means.
It is more denormalized therefore it is optimized for querying. Dimensional
models are deformalized and optimized for fast data querying. Many relational
database platforms recognize this model and optimize query execution plans to
aid in performance.
Dimensional modelling in data warehouse creates a schema which is optimized
for high performance. It means fewer joins and helps with minimized data
redundancy.
Dimensional models can comfortably accommodate change. Dimension tables
can have more columns added to them without affecting existing business
intelligence applications using these tables.
68. Multi-dimensional Model
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
A multidimensional model views data in the form of a data-cube.
Mostly, data warehousing supports two or three-dimensional cubes.
A data cube enables data to be modeled and viewed in multiple
dimensions. It is defined by dimensions and facts.
A data cube allows data to be viewed in multiple dimensions.
A dimensions are entities with respect to which an organization
wants to keep records.
For example in store sales record, dimensions allow the store to keep
track of things like monthly sales of items and the branches and
locations.
A multidimensional databases helps to provide data-related answers
to complex business queries quickly and accurately.
69. Multi-dimensional Model
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Data warehouses and Online Analytical Processing (OLAP) tools are
based on a multidimensional data model.
OLAP in data warehousing enables users to view data from different
angles and dimensions.
A multidimensional data model is organized around a central theme,
for example, sales.
This theme is represented by a fact table.
Facts are numerical measures.
The fact table contains the names of the facts or measures of the
related dimensional tables.
Consider the data of a shop for items sold per quarter in the city of
Delhi. The data is shown in the table.
70. Multi-dimensional Model
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
In this 2D representation, the sales for Delhi are shown for the time
dimension (organized in quarters) and the item dimension (classified
according to the types of an item sold). The fact or measure displayed in
rupee_sold (in thousands).
71. Multi-dimensional Model
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Now, if we want to view the sales data with a third dimension, For example,
suppose the data according to time and item, as well as the location is
considered for the cities Chennai, Kolkata, Mumbai, and Delhi.
These 3D data are shown in the table. The 3D data of the table are
represented as a series of 2D tables.
73. Multi-dimensional Model
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Conceptually, it may also be represented by the same data in the form of a
3D data cube, as shown in fig:
74. Data Cube
When data is grouped or combined in multidimensional matrices
called Data Cubes. The data cube method has a few alternative
names or a few variants, such as "Multidimensional databases,"
"materialized views," and "OLAP (On-Line Analytical Processing)."
, Medi-Caps University, Indore
75. Data Cube
Example: In the 2-D representation, we will look at the All
Electronics sales data for items sold per quarter in the city of
Vancouver. The measured display in dollars sold (in thousands).
, Medi-Caps University, Indore
76. Data Cube
3-Dimensional Cuboids
Let suppose we would like to view the sales data with a third
dimension. For example, suppose we would like to view the data
according to time, item as well as the location for the cities Chicago,
New York, Toronto, and Vancouver. The measured display in dollars
sold (in thousands). These 3-D data are shown in the table. The 3-D
data of the table are represented as a series of 2-D tables.
, Medi-Caps University, Indore
77. Data Cube
Conceptually, we may represent the same data in the form of 3-D
data cubes, as shown in fig:
, Medi-Caps University, Indore
78. Data Cube
Let us suppose that we would like to view our sales data with an
additional fourth dimension, such as a supplier.
In data warehousing, the data cubes are n-dimensional.
The cuboid which holds the lowest level of summarization is called a
base cuboid.
For example, the 4-D cuboid in the figure is the base cuboid for the
given time, item, location, and supplier dimensions.
Figure is shown a 4-D data cube representation of sales data,
according to the dimensions time, item, location, and supplier. The
measure displayed is dollars sold (in thousands).
, Medi-Caps University, Indore
80. Data Cube
, Medi-Caps University, Indore
The topmost 0-D cuboid, which holds the highest level of
summarization, is known as the apex cuboid.
In this example, this is the total sales, or dollars sold, summarized
over all four dimensions.
The lattice of cuboid forms a data cube.
The figure shows the lattice of cuboids creating 4-D data cubes for
the dimension time, item, location, and supplier.
Each cuboid represents a different degree of summarization.
82. Summary
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
• OLAP is a technology that enables analysts to extract and view
business data from different points of view.
• At the core of the OLAP concept, is an OLAP Cube.
• Various business applications and other data operations require the
use of OLAP Cube.
• There are primary five types of analytical operations in OLAP 1)
Roll-up 2) Drill-down 3) Slice 4) Dice and 5) Pivot
• Three types of widely used OLAP systems are MOLAP, ROLAP, and
Hybrid OLAP.
• Desktop OLAP, Web OLAP, and Mobile OLAP are some other types
of OLAP systems.
83. Summary
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
• A dimensional model is a data structure technique optimized for Data
warehousing tools.
• Facts are the measurements/metrics or facts from your business process.
• Dimension provides the context surrounding a business process event.
• Attributes are the various characteristics of the dimension modelling.
• A fact table is a primary table in a dimensional model.
• A dimension table contains dimensions of a fact.
• There are three types of facts 1. Additive 2. Non-additive 3. Semi-
additive
• Types of Dimensions are Conformed, Outrigger, Shrunken, Role-playing,
Dimension to Dimension Table, Junk, Degenerate, Swappable and Step
Dimensions.
84. Summary
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
• Five steps of Dimensional modeling are 1. Identify Business Process 2.
Identify Grain (level of detail) 3. Identify Dimensions 4. Identify Facts 5.
Build Star
• For Dimensional modelling in data warehouse, there is a need to ensure
that every fact table has an associated date dimension table.
86. Thank You
Great God, Medi-Caps, All the attendees
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
www.sagarpandya.tk
LinkedIn: /in/seapandya
Twitter: @seapandya
Facebook: /seapandya