SlideShare a Scribd company logo
And its relation to the four dominant
scientific DWH-modeling concepts
Data warehousing in practice
Drs. S.F.J Otten
12-05-2015
Topics
 About me…
 Business Intelligence
 What is a Data warehouse (DWH)
 DWH – Design strategies
 Data-modeling
 Brief history in data modeling
 Star-schematic
 Snowflake-schematic
 Datavault
 Anchormodeling
 Pratical examples
 Summary
About me…
 Education
 Highschool (MAVO)
 College (MBO ICT lvl.4)
 Univeristy of Applied
sciences (Avans
Hogeschool, Business
Informatics; BSc)
 Utrecht University (MBI;
MSc)
 Utrecht University (PhD)
 Carreer till now..
 Kadenza (privatly held (80
employees) (2014 –
present)
 BI-consultant/architect
(Microsoft BI stack)
 CSB-System BV/GmbH
(privatly held, 500-1000
employees globally)
(2010-2014)
 BI-consultant/architect
(Microsoft BI stack)
 Lead
programmingdepartment
for BI at HQ
 Semantic development
Business Intelligence
 Business Intelligence??
 “a way for organizations to understand their internal and external
environment through the systematic acquisition, collation, analysis,
interpretation and exploitation of information” (Watson & Wixom,
2007).
What is a Data warehouse (1)
 Data warehouse?? (DWH)
 “a repository where all data relevant to the
management of an organization is stored and from
which knowledge emerges.” (March & Hevner, 2007)
 “A data warehouse is a subject-oriented, integrated,
time-variant, nonvolatile collection of data in support
of management’s decision-making process.” (Inmon, 1992)
 Different definitions same goal;
 provide data in such a way that it has meaning and
can be used in all levels of an organization as input for
a decision-making-process
DWH – design strategies (1)
 Enterprise wide DWH-design (Imnon, 2002)
 DWH is designed by using a normalized enterprise
data model From the EDWH data marts for specific
business domains are derived
 Data mart design (Kimball, 2002)
 Hybrid strategy (top-down & bottom-up) for DWH-
design
 Create datamarts in a bottom-up fasion
 Datamart-design conforms to a top-down
skeleton/framwork-design which is called the
“data warehouse bus”
 The EDW = the union of the conformed datamarts
DWH – design strategies (2)
Push (data driven)
DWH – design strategies (3)
Pull (information driven)
DWH – design strategies (3)
Inmon Kimball
 Subject-oriented
 Integrated
 Non-volatile
 Time-variant
 Top-Down
 Integration via assumed
Enterprise data model
(EDM / 3NF)
 Datamarts are derived
from EDW
 Business-process-
oriented
 Bottom-up /evolutionary
 Dimensional modeling
(star-schematic)
 Integration via conformed
dimensions
 Star-schematic enforces
query semantics
 The sum of the datamarts
= the EDW
Data-modeling history
Data-modeling – Star/SF - concepts
Concepts
Star-/snowflake-schematic Golfarelli, M., Maio, D., & Rizzi, S. (1998)
Fact-table A fact is a focus of interest for the
decision-making process; typically, it
models an
event occurring in the enterprise world
(e.g., sales and shipments)
Dimension-table Dimensions are discrete
attributes which determine the minimum
granularity adopted to represent facts;
typical dimensions for the sale fact are
product, store and date
Hierarchy Discrete dimension attributes linked by -
to-one relationships, and determine how
facts may be aggregated and selected
significantly for the decision-making
process.
Data-modeling - star-schematic
• Comprises of a
single fact-table
• Has N-
dimension-
tables
• Each tuple in the
fact-table has a
pointer (FK) to
each of the
dimension-
tables
• Each dimension-
table has
columns that
correspond to
attributes of the
specific
dimensions(Chaud
huri & Dayal, 1997)
Data-modeling - snowflake-schematic
• A normalized
star-
schematic
(3NF)
• Dimensions
are split up in
to sub
dimensions
• Lesser FK’s
in fact-table
• Easier
maintenance
Data-modeling –Star/SF - ETL
• Conventional
DWH-architecture
(Star-/SF-
schematic) for
populating a DWH
• RFC has a high
impact on existing
ETL-
practice/package
and DWH (i.e.
request for a new
metric) = re-
engineering 
• Introduction of a
new IT-system
causes serious
rework and
headaches 
Data-modeling – Star/SF – ETL - P.O.A
 Two types of ETL:
 FULL ETL
 Complete transfer of all data in source-systems via ETL-
packages
 Incremental ETL
 After FULL ETL , incremental ETL determines the delta and
loads it into the DWH. The loading can be :
 INSERT records that are not present in the DWH
 UPDATE records that have changed values in certain
columns
o Requires UPDATE-statements need to take into
account the keys (primary and foreign) that uniquely
identify a record in a table and execute the UPDATE-
statement); risky if not entirely clear what the unique
identifier is.
Data-modeling – Star/SF – Case (1)
 DWH = Snowflake-architecture (3NF)
 Dimension-tables (DimItem,DimInvoice)
 Fact-table (FactSalesStatistics)
 ETL comprises a FULL and INCREMENTAL-load
 Client A sends an RFC for an addition in the sales-
overview.
 Addition = metric “NetValue” per item per invoice
 Additional req= metric “NetValue” is present for future
data and also for data allready residing in the sales-
overview
 How would you guys, as future Business-/Technical-
consultants / researches approach this case??
Data-modeling – Star/SF – Case (2)
 Solution
 Identify column containing metric “NetValue” in the source-
system (requires in-depth analysis of transactional system)
 Add column to fact table “FactSalesStatistics” ([NetValue]
[decimal] (x,y) NULL)
 Revert to appropriate ETL-package;
 Adjust the source-query / source-columns to include the identified
column (metric)
 Adjust the function that determines the Delta (add identified column)
 Adjust the INSERT-command to write the value from the identified
source-column  metric “NetValue” in fact-table “FactSalesStatistics”
 Adjust the UPDATE-command to update the metric “NetValue” with the
value from the identified source-column for the existing data in table
“FactSalesStatistics”
 VALIDATE…VALIDATE…VALIDATE…the ERP-data and
DWH-data (especially in the beginning)
Data-modeling – Star/SF – Case (3)
 Introduce the new metric in your Sales-cube
 Refresh the data source / data source view to get the metric
“NetValue” in the cube-server-environment
 Add measure simply by adding the metric in a measuregroup
in the sales-cube
 Process the cube and the metric should be available for all
end-users
Data-modeling – Datavault - Concepts
Concepts
Data vault (DV) Lindstedt, D., & Graziano, K. (2011)
Data vault The Data Vault is a detail oriented,
historical tracking and uniquely linked set
of normalized tables that support one or
more functional areas of business. It is
scalable and flexible
Hub The Hub is intended to represent major
identifiable concepts-entities of interest
from the real world. It is required that
every Hub entity can be denoted by a
unique identifier
Link The Link represents relationship among
Concepts. Both, Hubs and Links may be
involved in such relationships
Satellite The Satellite is used to associate a Hub
(or a Link) with (a data model) attribute
Data-modeling – Datavault - Schematic
• Comprises of
N-Hub-/Link-
/Satellite-
tables
• Scalable/Flexi
ble
• 100% of the
data, 100% of
the time
• Fairly new to
DWH-world
• Used by large
organizations
(i.e. D.O.D,
ABN-AMRO)
Data-modeling – Datavault - ETL
• Datavault-
ETL-
architecture
for populating
a datavault.
• RFC has no
impact on
existing ETL-
practice/packa
ge and DWH;
no re-
engineering 
• Introduction of
new IT-system
does not
cause
headaches 
Data-modeling – Datavault – ETL – P.O.A
 Two types of ETL:
 FULL ETL
 Complete transfer of all data in source-systems via ETL-packages
 Decomposition of existing tables in to Hubs, Links, and Satellites
 Incremental ETL
 After FULL ETL , incremental ETL determines the delta and
loads it into the DWH. The loading can be :
 INSERT records that are not present in the DWH
 END-DATING records that are not valid anymore
 There is no UPDATING of metric columns in Datavault.
Only an End-date update is required
Data-modeling – Datavault – Case (1)
 DWH = Datavault-architecture
 Hub-tables (H_Product,H_Customer,H_Order)
 Link-tables (L_SalesOrder)
 Satellite-tables
(S_Product_1,S_SalesOrder_1,S_Customer_1)
 ETL comprises a FULL and INCREMENTAL-load
 Client A sends an RFC for an addition in the sales-
overview.
 Addition = metric “NetValue” per item per order
 Additional req= metric “NetValue is present for future data and
also for data allready residing in the sales-overview
 How would you guys, as future Business-/Technical-
consultants / researches approach this case??
Data-modeling – Datavault – Case (2)
 Solution
 Identify column containing metric “NetValue” in the source-system
(requires in-depth analysis of transactional system)
 Create a new table in the DWH called S_SalesOrder_2
(ProductID,CustomerID,OrderID,LoadDate,NetValue,MD5,Source,EndDa
te)
 Create a new ETL-package
 Provide the source-query/ source-columns including the new metric
“NetValue”
 Create the function that determines the Delta (Keyfields &identified
column)
 Create the INSERT-command to write the value from the identified
source-column  metric “NetValue” in satellite S_SalesOrder_2 with
additional values for
“ProductID,CustomerID,OrderID,LoadDate,MD5,Source)
 Optional: Create EndDate-function (with the help of staging-tables)
 VALIDATE…VALIDATE…VALIDATE…the ERP-data and DWH-data
(especially in the beginning)
Data-modeling – Datavault – Case (3)
Data-modeling – Datavault – Case (4)
 Datavault does not store data in a structure that is suited
for usage in a datacube.
 A datacube needs a Star-/SF-schematic. Hence, data
marts or a “Business vault” is created.
 introducting new data in the cube, by using a data mart, is
the same as for a Star-/SF-schematic DWH
Data-modeling – Anchormodeling -
concepts
Concepts
Anchor modeling (AM) Rönnbäck (2010)
Anchor modeling Anchor modeling is an agile information
modeling technique that offers non-
destructive extensibility mechanisms.
Anchor An anchor represents a set of entities.
Attribute Attributes are used to represent
properties of anchors
Tie tie represents an association between
two or more anchor entities and optional
knot entities
Knot knot is used to represent a fixed,
typically small, set of entities that do not
change over time
Data-modeling – anchormodeling -
schematic
• 6NF-modeling
• Assumption
of AM is that
data changes
over time
• Future proof
• Evolution of
data model is
done through
extensions
• Modulair
• Agile
• Bottom up
Data-modeling – anchormodeling - ETL
 ETL-procedure has many similarities with DV-ETL-ing
 In DV first the HUBS are filled, followed by the LINKS and to
finish it of the SATELLITES are filled.
 With AM at first the ANCHORS are populated, followed by
the TIES and ATTRIBUTES
 In addition a metadata-repository is filled with each ETL-run
 Like DV, there are only INSERT-statements and END-
DATING-procedures.
 NO UPDATE-statement
 DELETE-statement is only performed when errornous data is
loaded for a given batch
Data-modeling – anchormodeling – ETL –
P.O.A
 In an ANCHOR only the surrogate key is stored. While with
DV in a HUB the surrogate key and businesskey are
stored together
 How is this resolved in an ETL-environment?
 Well, the same as to populate a HUB in DV but with an
additional step.
 Additional attributes can be loaded in parallel like in DV.
For each of those attributes the surrogatekey is resolved
by referencing the businesskey-attribute.
BREAK
Practical examples
 Star /SF-schematic
 ETL
 DWH
 Datavault
 ETL
 DWH
 Anchor Modeling
 ETL
 DWH
Practical examples - transition
Summary (1)
 Two main DWH-design-strategies
 Enterprise wide DWH-design
 DWH is designed by using a normalized enterprise data model
 From the EDWH data marts for specific business domains are
derived
 Data mart Design
 Create datamarts in a bottom-up fasion
 Datamart-design conforms to a top-down skeleton/framwork-
design which is called the “data warehouse bus”
 The EDW = the union of the conformed datamarts
Summary (2)
 Four main Data-modeling-techniques
 Star-/Snowflake were introduced in the 80’s
 Star-/Snowflake requires re-engineering when introducing new
metrics or systems at the source (ETL/DWH). High impact
 Not Agile, specs are determined beforehand, traditional way of
system development  deliver results slow  hard to expand
existing
 Datavault / anchor-modeling introduced in early/mid 00’s
 Flexible, Scalable data-model, requires no re-engineering when
introducing new metrics or systems at the source (ETL/DWH),
simply extend/expand. Little to no impact
 Agile  fast developemt track due to iterative development start
small  deliver results fast Expand  Scale without effort
Summary (3)
 So, which data-modeling technique comes out as the
winner…
 Well, None, they can co-exist and you should choose the
one that is suited for your needs,demands, skillset etc.
 It is merely a tool for acieving your goal
Thank you
 @Linkedin : http://nl.linkedin.com/in/sjorsotten
 @mail : Sjors.Otten@kadenza.nl

More Related Content

What's hot

Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
Vnktp1
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databases
yazad dumasia
 
E-R vs Starschema
E-R vs StarschemaE-R vs Starschema
E-R vs Starschema
guest862640
 
Difference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationDifference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellation
Asim Saif
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2akitda
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDW
ukc4
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
Multidimensional schema
Multidimensional schemaMultidimensional schema
Multidimensional schema
Chaand Chopra
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNGDivya Tadi
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
Fact table design for data ware house
Fact table design for data ware houseFact table design for data ware house
Fact table design for data ware house
Sayed Ahmed
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bits
Trivadis
 
An Introduction To BI
An Introduction To BIAn Introduction To BI
An Introduction To BI
MoniqueO Opris
 

What's hot (20)

Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databases
 
E-R vs Starschema
E-R vs StarschemaE-R vs Starschema
E-R vs Starschema
 
Difference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationDifference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellation
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
 
TDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDWTDWI Roundtable: The HANA EDW
TDWI Roundtable: The HANA EDW
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
Business Intelligence: A Review
Business Intelligence: A ReviewBusiness Intelligence: A Review
Business Intelligence: A Review
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Multidimensional schema
Multidimensional schemaMultidimensional schema
Multidimensional schema
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Fact table design for data ware house
Fact table design for data ware houseFact table design for data ware house
Fact table design for data ware house
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bits
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
An Introduction To BI
An Introduction To BIAn Introduction To BI
An Introduction To BI
 

Similar to Data warehousing in practice 2015

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
Er. Nawaraj Bhandari
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
ShivarkarSandip
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
meghu123
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Raphael Branger
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
ambujm
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
yesheeka
 
DW 101
DW 101DW 101
DW 101
jeffd00
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
Valmik Potbhare
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCESALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
cscpconf
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdf
AnilGupta681764
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
moni sindhu
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
DURGADEVIL
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
James Serra
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
ambujm
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
sonalighai
 

Similar to Data warehousing in practice 2015 (20)

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
3dw
3dw3dw
3dw
 
3dw
3dw3dw
3dw
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
 
DW 101
DW 101DW 101
DW 101
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCESALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdf
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Data warehousing in practice 2015

  • 1. And its relation to the four dominant scientific DWH-modeling concepts Data warehousing in practice Drs. S.F.J Otten 12-05-2015
  • 2. Topics  About me…  Business Intelligence  What is a Data warehouse (DWH)  DWH – Design strategies  Data-modeling  Brief history in data modeling  Star-schematic  Snowflake-schematic  Datavault  Anchormodeling  Pratical examples  Summary
  • 3. About me…  Education  Highschool (MAVO)  College (MBO ICT lvl.4)  Univeristy of Applied sciences (Avans Hogeschool, Business Informatics; BSc)  Utrecht University (MBI; MSc)  Utrecht University (PhD)  Carreer till now..  Kadenza (privatly held (80 employees) (2014 – present)  BI-consultant/architect (Microsoft BI stack)  CSB-System BV/GmbH (privatly held, 500-1000 employees globally) (2010-2014)  BI-consultant/architect (Microsoft BI stack)  Lead programmingdepartment for BI at HQ  Semantic development
  • 4. Business Intelligence  Business Intelligence??  “a way for organizations to understand their internal and external environment through the systematic acquisition, collation, analysis, interpretation and exploitation of information” (Watson & Wixom, 2007).
  • 5. What is a Data warehouse (1)  Data warehouse?? (DWH)  “a repository where all data relevant to the management of an organization is stored and from which knowledge emerges.” (March & Hevner, 2007)  “A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.” (Inmon, 1992)  Different definitions same goal;  provide data in such a way that it has meaning and can be used in all levels of an organization as input for a decision-making-process
  • 6. DWH – design strategies (1)  Enterprise wide DWH-design (Imnon, 2002)  DWH is designed by using a normalized enterprise data model From the EDWH data marts for specific business domains are derived  Data mart design (Kimball, 2002)  Hybrid strategy (top-down & bottom-up) for DWH- design  Create datamarts in a bottom-up fasion  Datamart-design conforms to a top-down skeleton/framwork-design which is called the “data warehouse bus”  The EDW = the union of the conformed datamarts
  • 7. DWH – design strategies (2) Push (data driven)
  • 8. DWH – design strategies (3) Pull (information driven)
  • 9. DWH – design strategies (3) Inmon Kimball  Subject-oriented  Integrated  Non-volatile  Time-variant  Top-Down  Integration via assumed Enterprise data model (EDM / 3NF)  Datamarts are derived from EDW  Business-process- oriented  Bottom-up /evolutionary  Dimensional modeling (star-schematic)  Integration via conformed dimensions  Star-schematic enforces query semantics  The sum of the datamarts = the EDW
  • 11. Data-modeling – Star/SF - concepts Concepts Star-/snowflake-schematic Golfarelli, M., Maio, D., & Rizzi, S. (1998) Fact-table A fact is a focus of interest for the decision-making process; typically, it models an event occurring in the enterprise world (e.g., sales and shipments) Dimension-table Dimensions are discrete attributes which determine the minimum granularity adopted to represent facts; typical dimensions for the sale fact are product, store and date Hierarchy Discrete dimension attributes linked by - to-one relationships, and determine how facts may be aggregated and selected significantly for the decision-making process.
  • 12. Data-modeling - star-schematic • Comprises of a single fact-table • Has N- dimension- tables • Each tuple in the fact-table has a pointer (FK) to each of the dimension- tables • Each dimension- table has columns that correspond to attributes of the specific dimensions(Chaud huri & Dayal, 1997)
  • 13. Data-modeling - snowflake-schematic • A normalized star- schematic (3NF) • Dimensions are split up in to sub dimensions • Lesser FK’s in fact-table • Easier maintenance
  • 14. Data-modeling –Star/SF - ETL • Conventional DWH-architecture (Star-/SF- schematic) for populating a DWH • RFC has a high impact on existing ETL- practice/package and DWH (i.e. request for a new metric) = re- engineering  • Introduction of a new IT-system causes serious rework and headaches 
  • 15. Data-modeling – Star/SF – ETL - P.O.A  Two types of ETL:  FULL ETL  Complete transfer of all data in source-systems via ETL- packages  Incremental ETL  After FULL ETL , incremental ETL determines the delta and loads it into the DWH. The loading can be :  INSERT records that are not present in the DWH  UPDATE records that have changed values in certain columns o Requires UPDATE-statements need to take into account the keys (primary and foreign) that uniquely identify a record in a table and execute the UPDATE- statement); risky if not entirely clear what the unique identifier is.
  • 16. Data-modeling – Star/SF – Case (1)  DWH = Snowflake-architecture (3NF)  Dimension-tables (DimItem,DimInvoice)  Fact-table (FactSalesStatistics)  ETL comprises a FULL and INCREMENTAL-load  Client A sends an RFC for an addition in the sales- overview.  Addition = metric “NetValue” per item per invoice  Additional req= metric “NetValue” is present for future data and also for data allready residing in the sales- overview  How would you guys, as future Business-/Technical- consultants / researches approach this case??
  • 17. Data-modeling – Star/SF – Case (2)  Solution  Identify column containing metric “NetValue” in the source- system (requires in-depth analysis of transactional system)  Add column to fact table “FactSalesStatistics” ([NetValue] [decimal] (x,y) NULL)  Revert to appropriate ETL-package;  Adjust the source-query / source-columns to include the identified column (metric)  Adjust the function that determines the Delta (add identified column)  Adjust the INSERT-command to write the value from the identified source-column  metric “NetValue” in fact-table “FactSalesStatistics”  Adjust the UPDATE-command to update the metric “NetValue” with the value from the identified source-column for the existing data in table “FactSalesStatistics”  VALIDATE…VALIDATE…VALIDATE…the ERP-data and DWH-data (especially in the beginning)
  • 18. Data-modeling – Star/SF – Case (3)  Introduce the new metric in your Sales-cube  Refresh the data source / data source view to get the metric “NetValue” in the cube-server-environment  Add measure simply by adding the metric in a measuregroup in the sales-cube  Process the cube and the metric should be available for all end-users
  • 19. Data-modeling – Datavault - Concepts Concepts Data vault (DV) Lindstedt, D., & Graziano, K. (2011) Data vault The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is scalable and flexible Hub The Hub is intended to represent major identifiable concepts-entities of interest from the real world. It is required that every Hub entity can be denoted by a unique identifier Link The Link represents relationship among Concepts. Both, Hubs and Links may be involved in such relationships Satellite The Satellite is used to associate a Hub (or a Link) with (a data model) attribute
  • 20. Data-modeling – Datavault - Schematic • Comprises of N-Hub-/Link- /Satellite- tables • Scalable/Flexi ble • 100% of the data, 100% of the time • Fairly new to DWH-world • Used by large organizations (i.e. D.O.D, ABN-AMRO)
  • 21. Data-modeling – Datavault - ETL • Datavault- ETL- architecture for populating a datavault. • RFC has no impact on existing ETL- practice/packa ge and DWH; no re- engineering  • Introduction of new IT-system does not cause headaches 
  • 22. Data-modeling – Datavault – ETL – P.O.A  Two types of ETL:  FULL ETL  Complete transfer of all data in source-systems via ETL-packages  Decomposition of existing tables in to Hubs, Links, and Satellites  Incremental ETL  After FULL ETL , incremental ETL determines the delta and loads it into the DWH. The loading can be :  INSERT records that are not present in the DWH  END-DATING records that are not valid anymore  There is no UPDATING of metric columns in Datavault. Only an End-date update is required
  • 23. Data-modeling – Datavault – Case (1)  DWH = Datavault-architecture  Hub-tables (H_Product,H_Customer,H_Order)  Link-tables (L_SalesOrder)  Satellite-tables (S_Product_1,S_SalesOrder_1,S_Customer_1)  ETL comprises a FULL and INCREMENTAL-load  Client A sends an RFC for an addition in the sales- overview.  Addition = metric “NetValue” per item per order  Additional req= metric “NetValue is present for future data and also for data allready residing in the sales-overview  How would you guys, as future Business-/Technical- consultants / researches approach this case??
  • 24. Data-modeling – Datavault – Case (2)  Solution  Identify column containing metric “NetValue” in the source-system (requires in-depth analysis of transactional system)  Create a new table in the DWH called S_SalesOrder_2 (ProductID,CustomerID,OrderID,LoadDate,NetValue,MD5,Source,EndDa te)  Create a new ETL-package  Provide the source-query/ source-columns including the new metric “NetValue”  Create the function that determines the Delta (Keyfields &identified column)  Create the INSERT-command to write the value from the identified source-column  metric “NetValue” in satellite S_SalesOrder_2 with additional values for “ProductID,CustomerID,OrderID,LoadDate,MD5,Source)  Optional: Create EndDate-function (with the help of staging-tables)  VALIDATE…VALIDATE…VALIDATE…the ERP-data and DWH-data (especially in the beginning)
  • 26. Data-modeling – Datavault – Case (4)  Datavault does not store data in a structure that is suited for usage in a datacube.  A datacube needs a Star-/SF-schematic. Hence, data marts or a “Business vault” is created.  introducting new data in the cube, by using a data mart, is the same as for a Star-/SF-schematic DWH
  • 27. Data-modeling – Anchormodeling - concepts Concepts Anchor modeling (AM) Rönnbäck (2010) Anchor modeling Anchor modeling is an agile information modeling technique that offers non- destructive extensibility mechanisms. Anchor An anchor represents a set of entities. Attribute Attributes are used to represent properties of anchors Tie tie represents an association between two or more anchor entities and optional knot entities Knot knot is used to represent a fixed, typically small, set of entities that do not change over time
  • 28. Data-modeling – anchormodeling - schematic • 6NF-modeling • Assumption of AM is that data changes over time • Future proof • Evolution of data model is done through extensions • Modulair • Agile • Bottom up
  • 29. Data-modeling – anchormodeling - ETL  ETL-procedure has many similarities with DV-ETL-ing  In DV first the HUBS are filled, followed by the LINKS and to finish it of the SATELLITES are filled.  With AM at first the ANCHORS are populated, followed by the TIES and ATTRIBUTES  In addition a metadata-repository is filled with each ETL-run  Like DV, there are only INSERT-statements and END- DATING-procedures.  NO UPDATE-statement  DELETE-statement is only performed when errornous data is loaded for a given batch
  • 30. Data-modeling – anchormodeling – ETL – P.O.A  In an ANCHOR only the surrogate key is stored. While with DV in a HUB the surrogate key and businesskey are stored together  How is this resolved in an ETL-environment?  Well, the same as to populate a HUB in DV but with an additional step.  Additional attributes can be loaded in parallel like in DV. For each of those attributes the surrogatekey is resolved by referencing the businesskey-attribute.
  • 31. BREAK
  • 32. Practical examples  Star /SF-schematic  ETL  DWH  Datavault  ETL  DWH  Anchor Modeling  ETL  DWH
  • 33. Practical examples - transition
  • 34. Summary (1)  Two main DWH-design-strategies  Enterprise wide DWH-design  DWH is designed by using a normalized enterprise data model  From the EDWH data marts for specific business domains are derived  Data mart Design  Create datamarts in a bottom-up fasion  Datamart-design conforms to a top-down skeleton/framwork- design which is called the “data warehouse bus”  The EDW = the union of the conformed datamarts
  • 35. Summary (2)  Four main Data-modeling-techniques  Star-/Snowflake were introduced in the 80’s  Star-/Snowflake requires re-engineering when introducing new metrics or systems at the source (ETL/DWH). High impact  Not Agile, specs are determined beforehand, traditional way of system development  deliver results slow  hard to expand existing  Datavault / anchor-modeling introduced in early/mid 00’s  Flexible, Scalable data-model, requires no re-engineering when introducing new metrics or systems at the source (ETL/DWH), simply extend/expand. Little to no impact  Agile  fast developemt track due to iterative development start small  deliver results fast Expand  Scale without effort
  • 36. Summary (3)  So, which data-modeling technique comes out as the winner…  Well, None, they can co-exist and you should choose the one that is suited for your needs,demands, skillset etc.  It is merely a tool for acieving your goal
  • 37. Thank you  @Linkedin : http://nl.linkedin.com/in/sjorsotten  @mail : Sjors.Otten@kadenza.nl

Editor's Notes

  1. BI-schematic overview (classic) Focus on DWH this presentation (RED) DWH = the core of a BI-environment. If data is not stored properly it could have serious concequenses for both the back-end and the front-end (business perspective; wrong numbers wrong decisions etc.) DWH-adjustment has effect on Extract Transform Load (ETL) and Presentation/Analytics (cube structure, report-definition etc)
  2. Comparing both schematics, each has its advantages and disadvantages. The star schematic is a simple design which is fast to set up, easy to use and more suitable for browsing dimensional tables due to its denormalized structure and therefore is often used in DW design. However, the possibility for inefficiencies due to a higher risk of redundancy is present (Chaudhuri & Dayal, 1997). The snowflake schematic requires more design time, takes longer to set up, but due to its normalized structure it allows for a decrease in redundancy and thereby the possible removal of inefficiencies resulting in performance enhancements.
  3. DELTA can be determined by hashvalues or MD5-checksums based on certain attributes and do a look-up. UPDATE = (1) Delete OLD-data, (2) INSERT NEW-data  expensive on the resources, takes quite some time when handlig large datasets, error-sensitive
  4. SF-architecture Two dim-tables; one facttable Delta-function() uses a hashvalue-component in ETL-package for determining the delta
  5. UPDATE-statement is error prone. If one does not know for sure how to uniquely identify a record the possibility exists that multiple records are wrongfully updated with new data. consequence  wrong data  wrong information  wrong knowledge  wrong input in strategic desicion process  wrong choices Who’s tom blame? Yep, the IT-guy who created it. There goes your reputation and good will from the client.
  6. Datavault-principles: 100% of the data, 100% of the time (no filtereing on datasource or aggregations) Flexible (extendibility with ease; introduction of new data into the datavault does not affect allready present datavault-structure) Scalable (due to its structure datavault is scalable over multiple servers without any problems and it can grow rapidly in size) (i.e. Used by D.O.D. due to its flexibily and scalability (3 PetaBytes of data))
  7. Sequence of data load per business-entity: Hubs Links Sattelites
  8. With recent sql-servers one is able to use lead() and LAG() functions to determine end-date. This eliminates the need for an update-statement for END-dating
  9. Anchormodeling is the latest addition to the data-modeling-schematics for a DWH. Just like Datavault it is very flexible and scalable. However, the decomposition of operational entities is even higher than in data vault and has strict modeling rules: Each attribute for an “anchor” or “tie” is stored in its own table Strict naming conventions for anchors, attributes, ties and knots AM obligates you to set-up metadata-repository Anchor only contains a surrogate key NULL-elimination
  10. If you have any questions afterwards please feel free to contact me.