SlideShare a Scribd company logo
1 of 5
Download to read offline
Data Mining Lecture 3
Define multi-dimensional data
model
From Tables and Spreadsheets to Data Cubes
A data warehouse is based on
 multidimensional data model which views
data in the form of a data cube
A data cube allows data to be modeled and
viewed in multiple dimensions (such as sales)
 Dimension tables, such as item (item_name,
brand, type), or time(day, week, month,
quarter, year)
 Fact table contains measures (such as
Euros_sold) and keys to each of the related
dimension tables
Definitions
 an n-Dimensional base cube is called a base
cuboid
 The top most 0-D cuboid, which holds the
highest-level of summarisation, is called the
apex cuboid
 The lattice of cuboids forms a data cube
Cube: A Lattice of Cuboids
Conceptual Modeling of Data
Warehouses
Modeling data warehouses: dimensions & measures
Star schema
- A fact table in the middle connected to a set
of dimension tables
Snowflake schema
- A refinement of star schema where some
dimensional hierarchy is normalized into a set
of smaller dimension tables, forming a shape
similar to snowflake
Fact constellations
- Multiple fact tables share dimension tables,
viewed as a collection of stars, therefore
called galaxy schema or fact
o constellation
DMQL: Language Primitives
Cube Definition (Fact Table)
 define cube <cube_name> [<dimension_list>]:
<measure_list>
Dimension Definition (Dimension Table)
 define dimension <dimension_name> as
(<attribute_or_subdimension_list>)
Special Case (Shared Dimension Tables)
 First time as “cube definition”
 define dimension <dimension_name> as
<dimension_name_first_time> in cube
<cube_name_first_time>
Defining a Star Schema in
DMQL
define cube sales_star [time, item, branch, location]:
Euros_sold = sum(sales_in_Euros),
avg_sales = avg(sales_in_Euros),
units_sold = count(*)
define dimension time as
(time_key, day, day_of_week, month,
quarter, year)
define dimension item as
(item_key, item_name, brand, type,
supplier_type)
define dimension branch as
(branch_key, branch_name, branch_type)
define dimension location as
(location_key, street, city, county, province,
country)
Defining a Snowflake Schema in
DMQL
define cube sales_snowflake [time, item, branch,
location]:
Euros_sold = sum(sales_in_Euros),
avg_sales = avg(sales_in_Euros),
units_sold = count(*)
define dimension time as
(time_key, day, day_of_week, month,
quarter, year)
define dimension item as
(item_key, item_name, brand, type,
supplier(supplier_key, supplier_type))
define dimension branch as
(branch_key, branch_name, branch_type)
define dimension location as
(location_key, street, city(city_key, county,
province, country))
Defining a Fact Constellation in
DMQL
define cube sales [time, item, branch, location]:
Euros_sold = sum(sales_in_Euros), avg_sales =
avg(sales_in_Euros),
units_sold = count(*)
define dimension time as (time_key, day,
day_of_week, month, quarter, year)
define dimension item as (item_key, item_name,
brand, type, supplier_type)
define dimension branch as (branch_key,
branch_name, branch_type)
define dimension location as (location_key, street,
city, province_or_state,
country)
define cube shipping [time, item, shipper,
from_location, to_location]:
Euro_cost = sum(cost_in_Euros), unit_shipped =
count(*)
define dimension time as time in cube sales
define dimension item as item in cube sales
define dimension shipper as (shipper_key,
shipper_name, location as location
in cube sales, shipper_type)
define dimension from_location as location in cube
sales
define dimension to_location as location in cube sales
Measures: Three Categories
Distributive
 if the result derived by applying the function
to n aggregate values is the same as that
derived by applying the function on all the
data without partitioning.
o E.g., count(), sum(), min(), max()
Algebraic
 if it can be computed by an algebraic function
with M arguments (where M is a bounded
integer), each of which is obtained by
applying a distributive aggregate function.
o E.g., avg(), min_N(),
standard_deviation()
Holistic
 if there is no constant bound on the storage
size needed to describe a sub-aggregate.
o E.g., median(), mode(), rank()
A Concept Hierarchy: Dimension
(location)
Concept hierarchy allows data to be handled at
varying levels of abstractions.
Multidimensional Data
Sales volume as a function of product,
month, and Country
A Sample Data Cube
Cuboids Corresponding to the Cube
Browsing a Data Cube
Typical OLAP Operations
Roll up (drill-up): summarise data
 by climbing up hierarchy or by dimension
reduction
Drill down (roll down): reverse of roll-up
 from higher level summary to lower level
summary or detailed data, or introducing new
dimensions
Slice and dice
 project and select
Pivot (rotate)
 reorient the cube, visualisation, 3D to series
of 2D planes.
Other operations
 drill across: involving (across) more than one
fact table
 drill through: through the bottom level of the
cube to its back-end relational tables (using
SQL)
A Star-Net Query Model
Design of a Data Warehouse: A
Business
Analysis Framework
Four views regarding the design of a data
warehouse
- Top-down view
o allows selection of the relevant
information necessary for the data
warehouse
- Data source view
o exposes the information being
captured, stored, and managed by
operational systems
- Data warehouse view
o consists of fact tables and dimension
tables
- Business query view
o sees the perspectives of data in the
warehouse from the view of enduser
Data Warehouse Design Process
Top-down, bottom-up approaches or a
combination of both
 Top-down: Starts with overall design and
planning (mature)
 Bottom-up: Starts with experiments and
prototypes (rapid)
From software engineering point of view
 Steps: planning, data collection, DW design,
testing and evaluation, DW deployment
 Waterfall: structured and systematic analysis
at each step before proceeding to the next
 Spiral: rapid generation of increasingly
functional systems, short turn around time,
quick turn around
Typical data warehouse design process
 Choose a business process to model, e.g.,
orders, invoices, etc.
 Choose the grain (atomic level of data) of the
business process
 Choose the dimensions that will apply to each
fact table record
 Choose the measure that will populate each
fact table record
Multi-Tiered Architecture
Three Data Warehouse Models
Enterprise warehouse
 collects all of the information about subjects
spanning the entire organisation
Data Mart
 a subset of corporate-wide data that is of
value to a specific group of users. Its scope is
confined to specific, selected groups, such as
marketing data mart
- Independent vs. dependent (directly from
warehouse) data mart
Virtual warehouse
 A set of views over operational databases
 Only some of the possible summary views
may be materialised
Data Warehouse Development:
A Recommended Approach
OLAP Server Architectures
Relational OLAP (ROLAP)
 Use relational or extended-relational DBMS
to store and manage warehouse data and
OLAP middleware to support missing pieces
 Include optimisation of DBMS backend,
implementation of aggregation navigation
logic, and additional tools and services
 greater scalability
Multidimensional OLAP (MOLAP)
 Array-based multidimensional storage engine
(sparse matrix techniques)
 fast indexing to pre-computed summarised
data
Hybrid OLAP (HOLAP)
 User flexibility, e.g., low level: relational,
high-level: array
Specialised SQL servers
 specialised support for SQL queries over
star/snowflake schemas
Home Work 1a
Suppose that a data warehouse for a Big University
consists of the following 4
dimensions: students, module, semester, and lecturer
and 2 measures count
and avg_grade. When at the lowest conceptual level
(e.g., for a given student,
module, semester, and lecturer combination), the
avg_grade measure stores
the actual module grade of the student. At higher
conceptual levels,
avg_grade stores the average grade for the given
combination.
1. Draw a snowflake schema diagram for the data
warehouse.
2. Starting with the base cuboid [student, module,
semester, lecturer], what
specific OLAP operations (e.g., roll-up from semester
to year) should one
perform in order to list the average grade of CS
modules for each Big
University student.
3. If each dimension has 5 levels (including all), such
as “student < major <
status < university < all”, how many cuboids will this
cube contain (including
the base and apex cuboids

More Related Content

What's hot

Data Modeling Basics
Data Modeling BasicsData Modeling Basics
Data Modeling Basicsrenuindia
 
ER model to Relational model mapping
ER model to Relational model mappingER model to Relational model mapping
ER model to Relational model mappingShubham Saini
 
Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database IntroductionChhom Karath
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining Suman Chatterjee
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Modelos de BDD y Modelos de Datos
Modelos de BDD y Modelos de DatosModelos de BDD y Modelos de Datos
Modelos de BDD y Modelos de DatosJosé Padrón
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse ArchitecturesTheju Paul
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Dr. Volkan OBAN
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesSusanMRob
 

What's hot (20)

Business Analytics Final.pdf
Business Analytics Final.pdfBusiness Analytics Final.pdf
Business Analytics Final.pdf
 
Data cube
Data cubeData cube
Data cube
 
Data Modeling Basics
Data Modeling BasicsData Modeling Basics
Data Modeling Basics
 
Database
DatabaseDatabase
Database
 
ER model to Relational model mapping
ER model to Relational model mappingER model to Relational model mapping
ER model to Relational model mapping
 
Oracle Database Introduction
Oracle Database IntroductionOracle Database Introduction
Oracle Database Introduction
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Modelos de BDD y Modelos de Datos
Modelos de BDD y Modelos de DatosModelos de BDD y Modelos de Datos
Modelos de BDD y Modelos de Datos
 
Data Warehouse Architectures
Data Warehouse ArchitecturesData Warehouse Architectures
Data Warehouse Architectures
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support Services
 

Similar to Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptxjainyshah20
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfAnilGupta681764
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Olap fundamentals
Olap fundamentalsOlap fundamentals
Olap fundamentalsAmit Sharma
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptMutiaSari53
 
BI - Data warehousing in practice
BI - Data warehousing in practiceBI - Data warehousing in practice
BI - Data warehousing in practiceSjors Otten
 

Similar to Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable) (20)

My2dw
My2dwMy2dw
My2dw
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
OLAPCUBE.pptx
OLAPCUBE.pptxOLAPCUBE.pptx
OLAPCUBE.pptx
 
Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdf
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Cs1011 dw-dm-1
Cs1011 dw-dm-1Cs1011 dw-dm-1
Cs1011 dw-dm-1
 
Olap fundamentals
Olap fundamentalsOlap fundamentals
Olap fundamentals
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
 
3dw
3dw3dw
3dw
 
Business Intelligence: A Review
Business Intelligence: A ReviewBusiness Intelligence: A Review
Business Intelligence: A Review
 
BI - Data warehousing in practice
BI - Data warehousing in practiceBI - Data warehousing in practice
BI - Data warehousing in practice
 

Recently uploaded

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 

Recently uploaded (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 

Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)

  • 1. Data Mining Lecture 3 Define multi-dimensional data model From Tables and Spreadsheets to Data Cubes A data warehouse is based on  multidimensional data model which views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions (such as sales)  Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)  Fact table contains measures (such as Euros_sold) and keys to each of the related dimension tables Definitions  an n-Dimensional base cube is called a base cuboid  The top most 0-D cuboid, which holds the highest-level of summarisation, is called the apex cuboid  The lattice of cuboids forms a data cube Cube: A Lattice of Cuboids Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema - A fact table in the middle connected to a set of dimension tables Snowflake schema - A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations - Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact o constellation
  • 2. DMQL: Language Primitives Cube Definition (Fact Table)  define cube <cube_name> [<dimension_list>]: <measure_list> Dimension Definition (Dimension Table)  define dimension <dimension_name> as (<attribute_or_subdimension_list>) Special Case (Shared Dimension Tables)  First time as “cube definition”  define dimension <dimension_name> as <dimension_name_first_time> in cube <cube_name_first_time> Defining a Star Schema in DMQL define cube sales_star [time, item, branch, location]: Euros_sold = sum(sales_in_Euros), avg_sales = avg(sales_in_Euros), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, county, province, country) Defining a Snowflake Schema in DMQL define cube sales_snowflake [time, item, branch, location]: Euros_sold = sum(sales_in_Euros), avg_sales = avg(sales_in_Euros), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type)) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city(city_key, county, province, country)) Defining a Fact Constellation in DMQL define cube sales [time, item, branch, location]: Euros_sold = sum(sales_in_Euros), avg_sales = avg(sales_in_Euros), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) define cube shipping [time, item, shipper, from_location, to_location]: Euro_cost = sum(cost_in_Euros), unit_shipped = count(*) define dimension time as time in cube sales define dimension item as item in cube sales define dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type) define dimension from_location as location in cube sales define dimension to_location as location in cube sales Measures: Three Categories Distributive  if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning. o E.g., count(), sum(), min(), max() Algebraic  if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function. o E.g., avg(), min_N(), standard_deviation() Holistic  if there is no constant bound on the storage size needed to describe a sub-aggregate. o E.g., median(), mode(), rank()
  • 3. A Concept Hierarchy: Dimension (location) Concept hierarchy allows data to be handled at varying levels of abstractions. Multidimensional Data Sales volume as a function of product, month, and Country A Sample Data Cube Cuboids Corresponding to the Cube Browsing a Data Cube
  • 4. Typical OLAP Operations Roll up (drill-up): summarise data  by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up  from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice  project and select Pivot (rotate)  reorient the cube, visualisation, 3D to series of 2D planes. Other operations  drill across: involving (across) more than one fact table  drill through: through the bottom level of the cube to its back-end relational tables (using SQL) A Star-Net Query Model Design of a Data Warehouse: A Business Analysis Framework Four views regarding the design of a data warehouse - Top-down view o allows selection of the relevant information necessary for the data warehouse - Data source view o exposes the information being captured, stored, and managed by operational systems - Data warehouse view o consists of fact tables and dimension tables - Business query view o sees the perspectives of data in the warehouse from the view of enduser Data Warehouse Design Process Top-down, bottom-up approaches or a combination of both  Top-down: Starts with overall design and planning (mature)  Bottom-up: Starts with experiments and prototypes (rapid) From software engineering point of view  Steps: planning, data collection, DW design, testing and evaluation, DW deployment  Waterfall: structured and systematic analysis at each step before proceeding to the next  Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around Typical data warehouse design process  Choose a business process to model, e.g., orders, invoices, etc.  Choose the grain (atomic level of data) of the business process  Choose the dimensions that will apply to each fact table record  Choose the measure that will populate each fact table record Multi-Tiered Architecture
  • 5. Three Data Warehouse Models Enterprise warehouse  collects all of the information about subjects spanning the entire organisation Data Mart  a subset of corporate-wide data that is of value to a specific group of users. Its scope is confined to specific, selected groups, such as marketing data mart - Independent vs. dependent (directly from warehouse) data mart Virtual warehouse  A set of views over operational databases  Only some of the possible summary views may be materialised Data Warehouse Development: A Recommended Approach OLAP Server Architectures Relational OLAP (ROLAP)  Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middleware to support missing pieces  Include optimisation of DBMS backend, implementation of aggregation navigation logic, and additional tools and services  greater scalability Multidimensional OLAP (MOLAP)  Array-based multidimensional storage engine (sparse matrix techniques)  fast indexing to pre-computed summarised data Hybrid OLAP (HOLAP)  User flexibility, e.g., low level: relational, high-level: array Specialised SQL servers  specialised support for SQL queries over star/snowflake schemas Home Work 1a Suppose that a data warehouse for a Big University consists of the following 4 dimensions: students, module, semester, and lecturer and 2 measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, module, semester, and lecturer combination), the avg_grade measure stores the actual module grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. 1. Draw a snowflake schema diagram for the data warehouse. 2. Starting with the base cuboid [student, module, semester, lecturer], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS modules for each Big University student. 3. If each dimension has 5 levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids