SlideShare a Scribd company logo
1 of 84
CPIT743
Data mining
Data warehousing and OLAP
Dr. Asma Cherif
Overview
• What is Business Intelligence?
• Why Business Intelligence?
• Data analysis problems
• Data Warehouse (DW) introduction
• DW topics
• Multidimensional modeling
• ETL
• Performance optimization
What is Business Intelligence (BI)?
From Encyclopedia of Database Systems:
“[BI] refers to a set of tools and techniques that
enable a company to transform its business
data into timely and accurate information for the
decisional process, to be made available to the
right persons in the most suitable form.”
BI vs AI ?
BI is different from Artificial Intelligence (AI)
• AI systems make decisions for the users
• BI systems help the users make the right decisions, based
on available data
Combination of technologies
• Data Warehousing (DW)
• On-Line Analytical Processing (OLAP)
• Data Mining (DM)
• ……
Cette photo par Auteur inconnu est soumise à la licence CC BY-SA-NC
Why is BI Important?
• Worldwide BI revenue in 2005 = US$ 5.7 billion
• 10% growth each year
• A market where players like IBM, Microsoft, Oracle,
and SAP compete and invest
• BI is not only for large enterprises
• Small and medium-sized companies can also benefit
from BI
• The financial crisis has increased the focus on BI
• You cannot afford not to use the “gold” in your data
BI and the Web
• The Web makes BI even more useful
• Customers do not appear “physically” in a store; their behaviors
• Cannot be observed by traditional methods
• A website log is used to capture the behavior of each customer,
• e.g., sequence of pages seen by a customer, the products viewed
• Idea: understand your customers using data and BI!
• Utilize website logs, analyze customer behavior in more detail than before
(e.g., what was not bought?)
• Combine web data with traditional customer data
Data analysis problems with
traditional RDBMS
Case Study of an Enterprise
• Example of a chain (e.g., fashion stores or car dealers)
• Each store maintains its own customer records and sales records
• Hard to answer questions like: “find the total sales of Product X from stores in town
Y”
• The same customer may be viewed as different customers for different stores
•  hard to detect duplicate customer information
• Imprecise or missing data in the addresses of some customers
• Purchase records maintained in the operational system for limited time (e.g., 6
months)
•  then they are deleted or archived
• The same “product” may have different prices, or different discounts in different
stores
• Can you see the problems of using those data for business analysis?
Data Analysis Problems
• The same data found in many different systems
• Example: customer data across different stores and departments
• The same concept is defined differently
• Heterogeneous sources
• Relational DBMS, On-Line Transaction Processing (OLTP)
• Unstructured data in files (e.g., MS Word)
• Legacy systems
• …
Data Analysis Problems (cont’)
• Data is suited for operational systems
• Accounting, billing, etc.
• Do not support analysis across business functions
• Data quality is bad
• Missing data, imprecise data, different use of systems
•Data are “volatile”
• Data deleted in operational systems (6 months)
• Data change over time – no historical information
Solution: DW
Data Warehousing
• Solution: new analysis environment (DW) where data are
• Subject oriented (versus function oriented)
• Integrated (logically and physically)
• Time variant (data can always be related to time)
• Stable (data not deleted, several versions)
• Supporting management decisions (different organization)
• Data from the operational systems are
• Extracted
• Cleansed
• Transformed
• Aggregated
• Loaded into the DW
• A good DW is a prerequisite for successful BI
DW: Purpose and Definition
• DW is a store of information organized in a unified data model
• Data collected from a number of different sources
• Finance, billing, website logs, personnel, …
• Purpose of a data warehouse (DW): support decision making
• Easy to perform advanced analysis
• Ad-hoc analysis and reports
• Data mining: discovery of hidden patterns and trends
DB vs DW
• Data base  schema 2D  app functions  OLTP: online
transaction processing
• DW  DB for optimised analysis xD OLAP: online
analytical processing
Cette photo par Auteur inconnu est soumise à la licence CC BY-SA-NC
Integrating Heterogeneous
Databases
Two integration approaches
Update-
driven
Approach
Query-
driven
Approach
New
Tradinional
Process of Query-Driven Approach
This approach was used to build wrappers and integrators on top of
multiple heterogeneous databases. These integrators are also known as
mediators.
1. When a query is issued to a client side, a metadata dictionary
translates the query into an appropriate form for individual
heterogeneous sites involved.
2. These queries are mapped and sent to the local query processor.
3. The results from heterogeneous sites are integrated into a global
answer set.
Disadvantages
Complex
integration &
filtering
Inefficient
Expensive for
frequent
aggrgations
Update-Driven Approach
• Today's data warehouse systems follow update-driven approach.
• In update-driven approach, the information from multiple
heterogeneous sources are integrated in advance and are stored in a
warehouse.
• This information is available for direct querying and analysis.
High performance
Data is copied, integrated, cleaned, annotated and
restructured in semantic store in advance
Querying does not require an interface to process data
at local sources
DW architecture
• Data source
• DM: data mart mini dataware house limited in scope
• ETL: Extract/Transform/Load
From edureka
metadata
Data Mart
• Data marts contain a subset of organization-wide data that is valuable
to specific groups of people in an organization.
• A data mart contains only those data that is specific to a particular
group.
• For example, the marketing data mart may contain only data related
to items, customers, and sales.
• Data marts are confined to subjects.
Dat Mart (contd.)
• Windows-based or Unix/Linux-based servers are used to implement data
marts.
• They are implemented on low-cost servers.
• The implementation cycle of a data mart is measured in short periods of
time, i.e., in weeks rather than months or years.
• The life cycle of data marts may be complex in the long run, if their
planning and design are not organization-wide.
• Data marts are small in size.
• Data marts are customized by department.
• The source of a data mart is departmentally structured data warehouse.
• Data marts are flexible.
Metadata
• Metadata is simply defined as data about data.
• The data that are used to represent other data is known as metadata.
• For example, the index of a book serves as a metadata for the
contents in the book.
• We can say that metadata is the summarized data that leads us to the
detailed data.
Metadata
In terms of data warehouse, we can define metadata as following:
• Metadata is a roadmap to data warehouse.
• Metadata in data warehouse defines the warehouse objects.
• Metadata acts as a directory.
• This directory helps the decision support system to locate the
contents of a data warehouse.
DW Architecture – Data as Materialized Views
Function vs. Subject Orientation
Top-down vs. Bottom-up
OLTP vs OLAP
What is OLTP?
• Online transaction processing shortly known as OLTP.
• Supports transaction-oriented applications in a 3-tier architecture.
OLTP administers day to day transaction of an organization.
• The primary objective is data processing and not data analysis
Example: ATM
• Assume that a couple has a joint account with a bank.
• One day both simultaneously reach different ATM centers at precisely
the same time and want to withdraw total amount present in their
bank account.
• However, the person that completes authentication process first will
be able to get money.
• In this case, OLTP system makes sure that withdrawn amount will be
never more than the amount present in the bank. The key to note
here is that OLTP systems are optimized for transactional superiority
instead data analysis.
What is OLAP?
• Online Analytical Processing.
• A category of software tools which provide analysis of data for
business decisions.
• OLAP systems allow users to analyze database information from
multiple database systems at one time.
• The primary objective is data analysis and not data processing.
Example
• Any Datawarehouse system is an OLAP system.
• Uses of OLAP are as follows:
• A company might compare their mobile phone sales in September with sales
in October, then compare those results with another location which may be
stored in a sperate database.
• Amazon analyzes purchases by its customers to come up with a personalized
homepage with products which likely interest to their customer.
Why not use the existing
databases (OLTP) for business
analysis?
Hard/Infeasible Queries for OLTP
• Business analysis queries
• In the past five years, which product is the most profitable?
• Which public holiday we have the largest sales?
• Which week we have the largest sales?
• Does the sales of dairy products increase over time?
• Difficult to express these queries in SQL
• 3rd query: may extract the “week” value using a function
• But the user has to learn many transformation functions …
• 4th query: use a “special” table to store IDs of all dairy products, in advance
• There can be many different dairy products; there can be many other
product types as well …
• The need of multidimensional modeling
Multidimensional Modeling
• Example: sales of supermarkets
• Facts and measures
• Each sales record is a fact, and its sales value is a measure
• Dimensions
• Group correlated attributes into the same dimension  easier for analysis
tasks
• Each sales record is associated with its values of Product, Store, Time
Multidimensional Modeling
• How do we model the Time
dimension?
• Hierarchies with multiple levels
• Attributes, e.g., holiday, event
• Advantage of this model?
• Easy for query (more about this
later)
• Disadvantage?
• Data redundancy (but controlled
redundancy is acceptable)
tid day Day # Week # Month
#
Year Work
day
…
1 January
1st
2009
1 1 1 2009 No …
2 January
2nd
2009
1 1 1 2009 Yes
Quick Review: Normalized Database
• Normalized database avoids
• Redundant data
• Modification anomalies
• How to get the original table? (join them)
• No redundancy in OLTP, controlled redundancy in OLAP
DW: Not in 3NF!
• Denormalized data
• Data redundancy
• More storage but data more quickly queried
• DW provides aggregate data which is in suitable format for decision
making
OLTP vs OLAP
OLTP OLAP
Target Operational needs Business analysis
Model Denormalized
Denormalized/
multidimensional
Data Small, operational Large, historical
Query Language SQL Not unified, mostly MDX
Query Small Large
Updates Frequent and small Infrequent and batch
Transactional recovery Necessary Not necessary
Optimized for Updates Query
Data Cube
• A data cube helps us represent data in multiple dimensions.
• It is defined by dimensions and facts.
• The dimensions are the entities with respect to which an enterprise
preserves the records.
How to represent multiple
dimensions?
Example
• Suppose Souq wants to keep track of sales records with the help of
sales data warehouse with respect to time, item, branch, and
location.
• These dimensions allow to keep track of monthly sales and at which
branch the items were sold.
• There is a table associated with each dimension. This table is known
as dimension table.
• For example, "item" dimension table may have attributes such as
item_name, item_type, and item_brand.
Location = Jeddah
Time Item type
Entertainment Laptop Mobile Sport
Q1 500 700 100 300
Q2 700 700 300 400
Q3 100 800 180 600
Q4 200 900 200 200
2-D view of Sales Data for a company with respect to time,
item, and location dimensions.
How to represent another dimension let say the
location (Ryadh, Taief, etc.)?
Time
Location = Jeddah Location = Ryadh Location = Taeif
Item type Item type Item type
Entertainm
ent
Laptop Mobile Sport Enterta
inment
Laptop Mobile Sport Enterta
inment
Laptop Mobile Sport
Q1 500 700 100 300 100 150 100 400 400 300 250 120
Q2 700 700 300 400 300 300 600 500 1000 100 309 345
Q3 100 800 180 600 200 400 500 300 300 200 399 764
Q4 200 900 200 200 100 200 100 200 400 100 400 897
3-D table can be represented as 3-D data cube
Jeddah
Ryadh
Taeif
Entment. Laptop Mobile Sport
Q1
Q2
Q3
TimeTime
Location
Item
500 700 100 300
400 300 250 120
100 100 309 345
300 200 399 764
More efficient
than RDBMS
More
readable
Interactive
Advantages
OLAP Data Cube: wrap up
• Data cube
• Useful data analysis tool in DW
• Generalized GROUP BY queries
• Aggregate facts based on chosen
dimensions
• Why data cube?
• Good for visualization (i.e., text results
hard to understand)
• Multidimensional, intuitive
• Support interactive OLAP operations
Cube operations
Four types of analytical operations in OLAP are:
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Roll-up
• Roll-up is also known as "consolidation" or "aggregation."
• The Roll-up operation can be performed in 2 ways
• Reducing dimensions
• Climbing up concept hierarchy.
• Concept hierarchy is a system of grouping things based on their order or level.
Example
Jeddah
Ryadh
Taeif
Entment. Laptop Mobile Sport
500 700 100 300
400 300 250 120
100 100 309 345
300 200 399 764
Makkah
region
900 1000 350 420
100 100 309 345
300 200 399 764
Drill-down
• In drill-down data is fragmented into smaller parts. It is the opposite
of the rollup process. It can be done via
• Moving down the concept hierarchy
• Increasing a dimension
Slice
• One dimension is selected, and a new sub-cube is created.
• Following diagram explain how slice operation performed:
Example: slice for time=Q1
Jeddah
Ryadh
Taeif
Entment. Laptop Mobile Sport
Q1
Q2
Q3
500 700 100 300
100 100 309 345
300 200 399 764
400 300 250 120
Dice
• This operation is similar to a slice.
• The difference in dice is you select 2 or more dimensions that result in
the creation of a sub-cube.
Example: dice for time = Q1 and Location=
Jeddah or Ryadh
Jeddah
Ryadh
Tayef
Entment. Laptop Mobile Sport
Q1
Q2
Q3
500 700 100 300
100 100 309 345
300 200 399 764
400 300 250 120
Pivot
• In Pivot, you rotate the data axes to provide a substitute presentation
of data.
Q1
Q2
Q3
Entment.
Laptop
Mobile
Sport
JeddahRyadhTayef
Jeddah
Ryadh
Tayef
Entment. Laptop Mobile Sport
Q1
Q2
Q3
500 700 100 300
100 100 309 345
300 200 399 764
400 300 250 120
OLAP Structures
OLAP
MOLAP
multidimensional
Implementes operation in
multidimensional data.
ROLAP
relational
Is an extended RDBMS along with
multidimensional data mapping to
perform the standard relational
operation.
HOLAP
hybrid
The aggregated totals are stored in a
multidimensional database while the
detailed data is stored in the relational
database. This offers both data
efficiency of the ROLAP model and the
performance of the MOLAP model.
Advantages of OLAP
• OLAP is a platform for all type of business includes planning, budgeting, reporting, and
analysis.
• Information and calculations are consistent in an OLAP cube. This is a crucial benefit.
• Quickly create and analyze "What if" scenarios
• Easily search OLAP database for broad or specific terms.
• OLAP provides the building blocks for business modeling tools, Data mining tools,
performance reporting tools.
• Allows users to do slice and dice cube data all by various dimensions, measures, and
filters.
• It is good for analyzing time series.
• Finding some clusters and outliers is easy with OLAP.
• It is a powerful visualization online analytical process system which provides faster
response times
Disadvantages of OLAP
• OLAP requires organizing data into a star or snowflake schema. These
schemas are complicated to implement and administer
• You cannot have large number of dimensions in a single OLAP cube
• Transactional data cannot be accessed with OLAP system.
• Any modification in an OLAP cube needs a full update of the cube.
This is a time-consuming process
Types of Data Warehouse
Schema
Multidimensional schemas?
• Multidimensional schema is especially designed to model data
warehouse systems.
• The schemas are designed to address the unique needs of very large
databases designed for the analytical purpose (OLAP).
Types of Data Warehouse
Schema
•Star Schema
•Snowflake Schema
•Galaxy Schema
Star Schema
What is a Star Schema?
• The star schema is the simplest type of Data Warehouse schema.
• aka Star Join Schema.
• It is optimized for querying large data sets.
• In the Star Schema:
• the center of the star can have one fact table and a number of associated
dimension tables.
• It is known as star schema as its structure resembles a star.
FACT Table
Customer
Dimension
Location
dimension
Product
Dimension
Time
Dimension
Keys
Measure
FACT Table
Customer
Dimension
id_cust
Location
dimension
id_loc
Product
Dimension
id_prd
Time
Dimension
id_qrt
Keys
Measure
FACT Table
id_prd
id_qrt
id_loc
id_cust
Customer
Dimension
id_cust
….
Location
dimension
id_loc
….
Product
Dimension
id_prd
….
Time
Dimension
id_qrt
….
sale
Characteristics of Star Schema
• Every dimension in a star schema is represented with the only one-
dimension table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a foreign key
• The dimension table are not joined to each other
• Fact table would contain key and measure
• The Star schema is easy to understand and provides optimal disk usage.
• The dimension tables are not normalized.
• The schema is widely supported by BI Tools
FACT Table
id_prd
id_qrt
id_loc
id_cust
Customer
Dimension
id_cust
….
Location
dimension
id_loc
….
Country
Product
Dimension
id_prd
….
Time
Dimension
id_qrt
….
sale
Snowflake Schema
What is a Snowflake Schema?
• A Snowflake Schema is an extension of a Star Schema, and it adds
additional dimensions.
• It is called snowflake because its diagram resembles a Snowflake.
• The dimension tables are normalized which splits data into additional
tables.
FACT Table
id_prd
id_qrt
id_loc
id_cust
Customer
Dimension
id_cust
….
Location
dimension
id_loc
….
Country
Product
Dimension
id_prd
….
Time
Dimension
id_qrt
….
sale
FACT Table
id_prd
id_qrt
id_loc
id_cust
Customer
Dimension
id_cust
….
Location
dimension
id_loc
….
Country
Product
Dimension
id_prd
….
Time
Dimension
id_qrt
….
sale
FACT Table
id_prd
id_qrt
id_loc
id_cust
Customer
Dimension
id_cust
….
Location
dimension
id_loc
Id_country
….
Product
Dimension
id_prd
….
Time
Dimension
id_qrt
….
sale
Country
dimension
Id_country
….
Characteristics of Snowflake Schema
• The main benefit of the snowflake schema it uses smaller disk space.
• Easier to implement a dimension is added to the Schema
• Due to multiple tables query performance is reduced
• The primary challenge that you will face while using the snowflake
Schema is that you need to perform more maintenance efforts
because of the more lookup tables.
Star vs snowflake
Star Schema Snowflake Schema
It contains a fact table surrounded by
dimension tables.
It contains a fact table surrounded by dimension
tables.
Only single join creates the relationship between
the fact table and any dimension tables.
Requires many joins to fetch the data.
Denormalized Data structure  query run faster
Normalized Data structure (dimensions) query run
slower
redundancy easy maintenance No edundancy difficult maintenance
Good for data data mart Good for warehouse core
Constellation Schema
What is fact constellation?
• It is a collection of multiple fact tables having some common
dimension tables.
• It can be viewed as a collection of several star schemas and hence,
also known as Galaxy schema.
• It is one of the widely used schema for Data warehouse designing and
it is much more complex than star and snowflake schema.
• For complex systems, we require fact constellations.
Geeksforgeeks.com
Characteristics of Galaxy Schema
• The dimensions in this schema are separated into separate dimensions
based on the various levels of hierarchy.
• For example, if geography has four levels of hierarchy like region, country,
state, and city then Galaxy schema should have four dimensions.
• Moreover, it is possible to build this type of schema by splitting the one-
star schema into more Star schemes.
• The dimensions are large in this schema which is needed to build based on
the levels of hierarchy.
• This schema is helpful for aggregating fact tables for better understanding.
Wrap up
• Data warehouse is a special DB that aggregates multiple DB
• DW is useful for analytical processing.
• DW is built through ETL
• DW may contain many datamarts
• OLTP is based on multidimensional schema: data cube
• There are 3 fact models: star, snowflake and galaxy
Literature
• Multidimensional Databases and Data Warehousing, Christian S.
Jensen, Torben Bach Pedersen, Christian Thomsen, Morgan &
Claypool Publishers, 2010
• Data Warehouse Design: Modern Principles and Methodologies,
Golfarelli and Rizzi, McGraw-Hill, 2009
• Advanced Data Warehouse Design: From Conventional to Spatial and
Temporal Applications, Elzbieta Malinowski, Esteban Zimányi,
Springer, 2008
• The Data Warehouse Lifecycle Toolkit, Kimball et al., Wiley 1998
• The Data Warehouse Toolkit, 2nd Ed., Kimball and Ross, Wiley, 2002
Sites
• Guru99
• Tutorial point
• Wikipedia
• Data Flair
• Edureka

More Related Content

What's hot

MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business IntelligenceJonathan Coleman
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
Big data, Analytics and 4th Generation Data Warehousing
Big data, Analytics and 4th Generation Data WarehousingBig data, Analytics and 4th Generation Data Warehousing
Big data, Analytics and 4th Generation Data WarehousingMartyn Richard Jones
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Er Bansal
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Bernardo Najlis
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architectureDeepak Chaurasia
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURESachin Batham
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Data Ware Housing And Data Mining
Data Ware Housing And Data MiningData Ware Housing And Data Mining
Data Ware Housing And Data Miningcpjcollege
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data miningSnehali Chake
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 

What's hot (20)

MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business Intelligence
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
Big data, Analytics and 4th Generation Data Warehousing
Big data, Analytics and 4th Generation Data WarehousingBig data, Analytics and 4th Generation Data Warehousing
Big data, Analytics and 4th Generation Data Warehousing
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Data Ware Housing And Data Mining
Data Ware Housing And Data MiningData Ware Housing And Data Mining
Data Ware Housing And Data Mining
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 

Similar to Data wharehousing and OLAP

DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overviewnetpeachteam
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Singh
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehousessuser7fc7eb
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasirguest7c8e5f
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionShivmohan Purohit
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionguest7b34c2
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionShivmohan Purohit
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 

Similar to Data wharehousing and OLAP (20)

DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Data wharehousing and OLAP

  • 1. CPIT743 Data mining Data warehousing and OLAP Dr. Asma Cherif
  • 2. Overview • What is Business Intelligence? • Why Business Intelligence? • Data analysis problems • Data Warehouse (DW) introduction • DW topics • Multidimensional modeling • ETL • Performance optimization
  • 3. What is Business Intelligence (BI)? From Encyclopedia of Database Systems: “[BI] refers to a set of tools and techniques that enable a company to transform its business data into timely and accurate information for the decisional process, to be made available to the right persons in the most suitable form.”
  • 4. BI vs AI ? BI is different from Artificial Intelligence (AI) • AI systems make decisions for the users • BI systems help the users make the right decisions, based on available data Combination of technologies • Data Warehousing (DW) • On-Line Analytical Processing (OLAP) • Data Mining (DM) • …… Cette photo par Auteur inconnu est soumise à la licence CC BY-SA-NC
  • 5. Why is BI Important? • Worldwide BI revenue in 2005 = US$ 5.7 billion • 10% growth each year • A market where players like IBM, Microsoft, Oracle, and SAP compete and invest • BI is not only for large enterprises • Small and medium-sized companies can also benefit from BI • The financial crisis has increased the focus on BI • You cannot afford not to use the “gold” in your data
  • 6. BI and the Web • The Web makes BI even more useful • Customers do not appear “physically” in a store; their behaviors • Cannot be observed by traditional methods • A website log is used to capture the behavior of each customer, • e.g., sequence of pages seen by a customer, the products viewed • Idea: understand your customers using data and BI! • Utilize website logs, analyze customer behavior in more detail than before (e.g., what was not bought?) • Combine web data with traditional customer data
  • 7. Data analysis problems with traditional RDBMS
  • 8. Case Study of an Enterprise • Example of a chain (e.g., fashion stores or car dealers) • Each store maintains its own customer records and sales records • Hard to answer questions like: “find the total sales of Product X from stores in town Y” • The same customer may be viewed as different customers for different stores •  hard to detect duplicate customer information • Imprecise or missing data in the addresses of some customers • Purchase records maintained in the operational system for limited time (e.g., 6 months) •  then they are deleted or archived • The same “product” may have different prices, or different discounts in different stores • Can you see the problems of using those data for business analysis?
  • 9. Data Analysis Problems • The same data found in many different systems • Example: customer data across different stores and departments • The same concept is defined differently • Heterogeneous sources • Relational DBMS, On-Line Transaction Processing (OLTP) • Unstructured data in files (e.g., MS Word) • Legacy systems • …
  • 10. Data Analysis Problems (cont’) • Data is suited for operational systems • Accounting, billing, etc. • Do not support analysis across business functions • Data quality is bad • Missing data, imprecise data, different use of systems •Data are “volatile” • Data deleted in operational systems (6 months) • Data change over time – no historical information
  • 12. Data Warehousing • Solution: new analysis environment (DW) where data are • Subject oriented (versus function oriented) • Integrated (logically and physically) • Time variant (data can always be related to time) • Stable (data not deleted, several versions) • Supporting management decisions (different organization) • Data from the operational systems are • Extracted • Cleansed • Transformed • Aggregated • Loaded into the DW • A good DW is a prerequisite for successful BI
  • 13. DW: Purpose and Definition • DW is a store of information organized in a unified data model • Data collected from a number of different sources • Finance, billing, website logs, personnel, … • Purpose of a data warehouse (DW): support decision making • Easy to perform advanced analysis • Ad-hoc analysis and reports • Data mining: discovery of hidden patterns and trends
  • 14. DB vs DW • Data base  schema 2D  app functions  OLTP: online transaction processing • DW  DB for optimised analysis xD OLAP: online analytical processing Cette photo par Auteur inconnu est soumise à la licence CC BY-SA-NC
  • 17. Process of Query-Driven Approach This approach was used to build wrappers and integrators on top of multiple heterogeneous databases. These integrators are also known as mediators. 1. When a query is issued to a client side, a metadata dictionary translates the query into an appropriate form for individual heterogeneous sites involved. 2. These queries are mapped and sent to the local query processor. 3. The results from heterogeneous sites are integrated into a global answer set.
  • 19. Update-Driven Approach • Today's data warehouse systems follow update-driven approach. • In update-driven approach, the information from multiple heterogeneous sources are integrated in advance and are stored in a warehouse. • This information is available for direct querying and analysis. High performance Data is copied, integrated, cleaned, annotated and restructured in semantic store in advance Querying does not require an interface to process data at local sources
  • 21. • Data source • DM: data mart mini dataware house limited in scope • ETL: Extract/Transform/Load From edureka metadata
  • 22. Data Mart • Data marts contain a subset of organization-wide data that is valuable to specific groups of people in an organization. • A data mart contains only those data that is specific to a particular group. • For example, the marketing data mart may contain only data related to items, customers, and sales. • Data marts are confined to subjects.
  • 23. Dat Mart (contd.) • Windows-based or Unix/Linux-based servers are used to implement data marts. • They are implemented on low-cost servers. • The implementation cycle of a data mart is measured in short periods of time, i.e., in weeks rather than months or years. • The life cycle of data marts may be complex in the long run, if their planning and design are not organization-wide. • Data marts are small in size. • Data marts are customized by department. • The source of a data mart is departmentally structured data warehouse. • Data marts are flexible.
  • 24. Metadata • Metadata is simply defined as data about data. • The data that are used to represent other data is known as metadata. • For example, the index of a book serves as a metadata for the contents in the book. • We can say that metadata is the summarized data that leads us to the detailed data.
  • 25. Metadata In terms of data warehouse, we can define metadata as following: • Metadata is a roadmap to data warehouse. • Metadata in data warehouse defines the warehouse objects. • Metadata acts as a directory. • This directory helps the decision support system to locate the contents of a data warehouse.
  • 26. DW Architecture – Data as Materialized Views
  • 27. Function vs. Subject Orientation
  • 30. What is OLTP? • Online transaction processing shortly known as OLTP. • Supports transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization. • The primary objective is data processing and not data analysis
  • 31. Example: ATM • Assume that a couple has a joint account with a bank. • One day both simultaneously reach different ATM centers at precisely the same time and want to withdraw total amount present in their bank account. • However, the person that completes authentication process first will be able to get money. • In this case, OLTP system makes sure that withdrawn amount will be never more than the amount present in the bank. The key to note here is that OLTP systems are optimized for transactional superiority instead data analysis.
  • 32. What is OLAP? • Online Analytical Processing. • A category of software tools which provide analysis of data for business decisions. • OLAP systems allow users to analyze database information from multiple database systems at one time. • The primary objective is data analysis and not data processing.
  • 33. Example • Any Datawarehouse system is an OLAP system. • Uses of OLAP are as follows: • A company might compare their mobile phone sales in September with sales in October, then compare those results with another location which may be stored in a sperate database. • Amazon analyzes purchases by its customers to come up with a personalized homepage with products which likely interest to their customer.
  • 34. Why not use the existing databases (OLTP) for business analysis?
  • 35. Hard/Infeasible Queries for OLTP • Business analysis queries • In the past five years, which product is the most profitable? • Which public holiday we have the largest sales? • Which week we have the largest sales? • Does the sales of dairy products increase over time? • Difficult to express these queries in SQL • 3rd query: may extract the “week” value using a function • But the user has to learn many transformation functions … • 4th query: use a “special” table to store IDs of all dairy products, in advance • There can be many different dairy products; there can be many other product types as well … • The need of multidimensional modeling
  • 36. Multidimensional Modeling • Example: sales of supermarkets • Facts and measures • Each sales record is a fact, and its sales value is a measure • Dimensions • Group correlated attributes into the same dimension  easier for analysis tasks • Each sales record is associated with its values of Product, Store, Time
  • 37. Multidimensional Modeling • How do we model the Time dimension? • Hierarchies with multiple levels • Attributes, e.g., holiday, event • Advantage of this model? • Easy for query (more about this later) • Disadvantage? • Data redundancy (but controlled redundancy is acceptable) tid day Day # Week # Month # Year Work day … 1 January 1st 2009 1 1 1 2009 No … 2 January 2nd 2009 1 1 1 2009 Yes
  • 38. Quick Review: Normalized Database • Normalized database avoids • Redundant data • Modification anomalies • How to get the original table? (join them) • No redundancy in OLTP, controlled redundancy in OLAP
  • 39. DW: Not in 3NF! • Denormalized data • Data redundancy • More storage but data more quickly queried • DW provides aggregate data which is in suitable format for decision making
  • 40. OLTP vs OLAP OLTP OLAP Target Operational needs Business analysis Model Denormalized Denormalized/ multidimensional Data Small, operational Large, historical Query Language SQL Not unified, mostly MDX Query Small Large Updates Frequent and small Infrequent and batch Transactional recovery Necessary Not necessary Optimized for Updates Query
  • 41. Data Cube • A data cube helps us represent data in multiple dimensions. • It is defined by dimensions and facts. • The dimensions are the entities with respect to which an enterprise preserves the records.
  • 42. How to represent multiple dimensions?
  • 43. Example • Suppose Souq wants to keep track of sales records with the help of sales data warehouse with respect to time, item, branch, and location. • These dimensions allow to keep track of monthly sales and at which branch the items were sold. • There is a table associated with each dimension. This table is known as dimension table. • For example, "item" dimension table may have attributes such as item_name, item_type, and item_brand.
  • 44. Location = Jeddah Time Item type Entertainment Laptop Mobile Sport Q1 500 700 100 300 Q2 700 700 300 400 Q3 100 800 180 600 Q4 200 900 200 200 2-D view of Sales Data for a company with respect to time, item, and location dimensions. How to represent another dimension let say the location (Ryadh, Taief, etc.)?
  • 45. Time Location = Jeddah Location = Ryadh Location = Taeif Item type Item type Item type Entertainm ent Laptop Mobile Sport Enterta inment Laptop Mobile Sport Enterta inment Laptop Mobile Sport Q1 500 700 100 300 100 150 100 400 400 300 250 120 Q2 700 700 300 400 300 300 600 500 1000 100 309 345 Q3 100 800 180 600 200 400 500 300 300 200 399 764 Q4 200 900 200 200 100 200 100 200 400 100 400 897 3-D table can be represented as 3-D data cube
  • 46. Jeddah Ryadh Taeif Entment. Laptop Mobile Sport Q1 Q2 Q3 TimeTime Location Item 500 700 100 300 400 300 250 120 100 100 309 345 300 200 399 764
  • 48. OLAP Data Cube: wrap up • Data cube • Useful data analysis tool in DW • Generalized GROUP BY queries • Aggregate facts based on chosen dimensions • Why data cube? • Good for visualization (i.e., text results hard to understand) • Multidimensional, intuitive • Support interactive OLAP operations
  • 49. Cube operations Four types of analytical operations in OLAP are: • Roll-up • Drill-down • Slice and dice • Pivot (rotate)
  • 50. Roll-up • Roll-up is also known as "consolidation" or "aggregation." • The Roll-up operation can be performed in 2 ways • Reducing dimensions • Climbing up concept hierarchy. • Concept hierarchy is a system of grouping things based on their order or level.
  • 51. Example Jeddah Ryadh Taeif Entment. Laptop Mobile Sport 500 700 100 300 400 300 250 120 100 100 309 345 300 200 399 764 Makkah region 900 1000 350 420 100 100 309 345 300 200 399 764
  • 52. Drill-down • In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can be done via • Moving down the concept hierarchy • Increasing a dimension
  • 53. Slice • One dimension is selected, and a new sub-cube is created. • Following diagram explain how slice operation performed:
  • 54. Example: slice for time=Q1 Jeddah Ryadh Taeif Entment. Laptop Mobile Sport Q1 Q2 Q3 500 700 100 300 100 100 309 345 300 200 399 764 400 300 250 120
  • 55. Dice • This operation is similar to a slice. • The difference in dice is you select 2 or more dimensions that result in the creation of a sub-cube.
  • 56. Example: dice for time = Q1 and Location= Jeddah or Ryadh Jeddah Ryadh Tayef Entment. Laptop Mobile Sport Q1 Q2 Q3 500 700 100 300 100 100 309 345 300 200 399 764 400 300 250 120
  • 57. Pivot • In Pivot, you rotate the data axes to provide a substitute presentation of data. Q1 Q2 Q3 Entment. Laptop Mobile Sport JeddahRyadhTayef Jeddah Ryadh Tayef Entment. Laptop Mobile Sport Q1 Q2 Q3 500 700 100 300 100 100 309 345 300 200 399 764 400 300 250 120
  • 58. OLAP Structures OLAP MOLAP multidimensional Implementes operation in multidimensional data. ROLAP relational Is an extended RDBMS along with multidimensional data mapping to perform the standard relational operation. HOLAP hybrid The aggregated totals are stored in a multidimensional database while the detailed data is stored in the relational database. This offers both data efficiency of the ROLAP model and the performance of the MOLAP model.
  • 59. Advantages of OLAP • OLAP is a platform for all type of business includes planning, budgeting, reporting, and analysis. • Information and calculations are consistent in an OLAP cube. This is a crucial benefit. • Quickly create and analyze "What if" scenarios • Easily search OLAP database for broad or specific terms. • OLAP provides the building blocks for business modeling tools, Data mining tools, performance reporting tools. • Allows users to do slice and dice cube data all by various dimensions, measures, and filters. • It is good for analyzing time series. • Finding some clusters and outliers is easy with OLAP. • It is a powerful visualization online analytical process system which provides faster response times
  • 60. Disadvantages of OLAP • OLAP requires organizing data into a star or snowflake schema. These schemas are complicated to implement and administer • You cannot have large number of dimensions in a single OLAP cube • Transactional data cannot be accessed with OLAP system. • Any modification in an OLAP cube needs a full update of the cube. This is a time-consuming process
  • 61. Types of Data Warehouse Schema
  • 62. Multidimensional schemas? • Multidimensional schema is especially designed to model data warehouse systems. • The schemas are designed to address the unique needs of very large databases designed for the analytical purpose (OLAP).
  • 63. Types of Data Warehouse Schema •Star Schema •Snowflake Schema •Galaxy Schema
  • 65. What is a Star Schema? • The star schema is the simplest type of Data Warehouse schema. • aka Star Join Schema. • It is optimized for querying large data sets. • In the Star Schema: • the center of the star can have one fact table and a number of associated dimension tables. • It is known as star schema as its structure resembles a star.
  • 69. Characteristics of Star Schema • Every dimension in a star schema is represented with the only one- dimension table. • The dimension table should contain the set of attributes. • The dimension table is joined to the fact table using a foreign key • The dimension table are not joined to each other • Fact table would contain key and measure • The Star schema is easy to understand and provides optimal disk usage. • The dimension tables are not normalized. • The schema is widely supported by BI Tools
  • 72. What is a Snowflake Schema? • A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. • It is called snowflake because its diagram resembles a Snowflake. • The dimension tables are normalized which splits data into additional tables.
  • 76. Characteristics of Snowflake Schema • The main benefit of the snowflake schema it uses smaller disk space. • Easier to implement a dimension is added to the Schema • Due to multiple tables query performance is reduced • The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables.
  • 77. Star vs snowflake Star Schema Snowflake Schema It contains a fact table surrounded by dimension tables. It contains a fact table surrounded by dimension tables. Only single join creates the relationship between the fact table and any dimension tables. Requires many joins to fetch the data. Denormalized Data structure  query run faster Normalized Data structure (dimensions) query run slower redundancy easy maintenance No edundancy difficult maintenance Good for data data mart Good for warehouse core
  • 79. What is fact constellation? • It is a collection of multiple fact tables having some common dimension tables. • It can be viewed as a collection of several star schemas and hence, also known as Galaxy schema. • It is one of the widely used schema for Data warehouse designing and it is much more complex than star and snowflake schema. • For complex systems, we require fact constellations.
  • 81. Characteristics of Galaxy Schema • The dimensions in this schema are separated into separate dimensions based on the various levels of hierarchy. • For example, if geography has four levels of hierarchy like region, country, state, and city then Galaxy schema should have four dimensions. • Moreover, it is possible to build this type of schema by splitting the one- star schema into more Star schemes. • The dimensions are large in this schema which is needed to build based on the levels of hierarchy. • This schema is helpful for aggregating fact tables for better understanding.
  • 82. Wrap up • Data warehouse is a special DB that aggregates multiple DB • DW is useful for analytical processing. • DW is built through ETL • DW may contain many datamarts • OLTP is based on multidimensional schema: data cube • There are 3 fact models: star, snowflake and galaxy
  • 83. Literature • Multidimensional Databases and Data Warehousing, Christian S. Jensen, Torben Bach Pedersen, Christian Thomsen, Morgan & Claypool Publishers, 2010 • Data Warehouse Design: Modern Principles and Methodologies, Golfarelli and Rizzi, McGraw-Hill, 2009 • Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications, Elzbieta Malinowski, Esteban Zimányi, Springer, 2008 • The Data Warehouse Lifecycle Toolkit, Kimball et al., Wiley 1998 • The Data Warehouse Toolkit, 2nd Ed., Kimball and Ross, Wiley, 2002
  • 84. Sites • Guru99 • Tutorial point • Wikipedia • Data Flair • Edureka