This document provides an overview of data warehousing concepts and data modeling techniques. It discusses four dominant data modeling concepts: star schemas, snowflake schemas, data vault modeling, and anchor modeling. For each concept, it describes key elements like facts, dimensions, and hierarchies. It also discusses how each concept approaches extract, transform, load processes and responding to changes. The document emphasizes that different techniques can co-exist and one should choose based on their specific needs, demands, and skills. It aims to educate about business intelligence, data warehousing, and agile versus traditional data modeling approaches.
this presentation covers the following:
* Data warehouse-design strategies
* Data warehouse-modeling techniques
* the points of attention when building ETL-procedures for one of these Data warehouse-modeling techniques
this presentation covers the following:
* Data warehouse-design strategies
* Data warehouse-modeling techniques
* the points of attention when building ETL-procedures for one of these Data warehouse-modeling techniques
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
MSBI online training offered by Quontra Solutions with special features having Extensive Training will be in both MSBI Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics that were required and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient IT Training. We have always been and still are focusing on the key aspect which is providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
It was an honor that my employer assigned me to study with Business Intelligence that follows SQL Server Analysis
Services. Hence I started and prepared a presentation as a startup guide for a new learner.
* Thanks to all the contributions gathered here to prepare the doc.
Data marts,Types of Data Marts,Multidimensional Data Model,Fact table ,Dimension table ,Data Warehouse Schema,Star Schema,Snowflake Schema,Fact-Constellation Schema
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
MSBI online training offered by Quontra Solutions with special features having Extensive Training will be in both MSBI Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics that were required and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient IT Training. We have always been and still are focusing on the key aspect which is providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
It was an honor that my employer assigned me to study with Business Intelligence that follows SQL Server Analysis
Services. Hence I started and prepared a presentation as a startup guide for a new learner.
* Thanks to all the contributions gathered here to prepare the doc.
Data marts,Types of Data Marts,Multidimensional Data Model,Fact table ,Dimension table ,Data Warehouse Schema,Star Schema,Snowflake Schema,Fact-Constellation Schema
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
"We now do Agile BI too” is often heard in todays BI community. But can you really "create" agile in Business Intelligence projects? This presentation shows that Agile BI doesn't necessarily start with the introduction of an iterative project approach. An organisation is well advised to establish first the necessary foundations in regards to organisation, business and technology in order to become capable of an iterative, incremental project approach in the BI domain.
In this session you learn which building blocks you need to consider. In addition you will see what a meaningful sequence to these building blocks is. Selected aspects like test automation, BI specific design patterns as well as the Disciplined Agile Framework will be explained in more and practical details.
Basics of Microsoft Business Intelligence and Data Integration TechniquesValmik Potbhare
The presentation used to get the conceptual understanding of Business Intelligence and Data warehousing applications. This also gives a basic knowledge about Microsoft's offerings on Business Intelligence space. Lastly but not least, it also contains some useful and uncommon SQL server programming best practices.
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
Data warehouse use has increased significantly in recent years and now plays a fundamental role in many organizations’ decision-support processes. An effective business intelligence
infrastructure that leverages the power of a data warehouse can deliver value by helping companies enhance their customer experience. In this paper is to generate reports with various
drilldowns and slier conditions with suitable parameters which provide a complete business solution which is helpful for monitor the company inflow and outflow. The goal of the work is
for potential users of the data warehouse in their decision making process in the Business process system to get a complete visual effort of those reports by creating the chart and grid interface from warehouse. The example in this paper relate directly to the Adventure Work Data Warehouse Project implementation which helps to know the internet sales amount according to different date
As a follow-on to the presentation "Building an Effective Data Warehouse Architecture", this presentation will explain exactly what Big Data is and its benefits, including use cases. We will discuss how Hadoop, the cloud and massively parallel processing (MPP) is changing the way data warehouses are being built. We will talk about hybrid architectures that combine on-premise data with data in the cloud as well as relational data and non-relational (unstructured) data. We will look at the benefits of MPP over SMP and how to integrate data from Internet of Things (IoT) devices. You will learn what a modern data warehouse should look like and how the role of a Data Lake and Hadoop fit in. In the end you will have guidance on the best solution for your data warehouse going forward.
Data warehousing and business intelligence project reportsonalighai
Developed Data warehouse project with a structured, semi-structured and unstructured sources of data
and generated Business Intelligence reports. Topic for the project was Tobacco products consumption in
America. Studied on which products are more famous among people across and also got to know that
middle school students are the soft targets for the tobacco companies as maximum people start taking
tobacco products at this age.
Tools used: SSMS, SSIS, SSAS, SSRS, R-Studio, Power BI, Excel
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
1. And its relation to the four dominant
scientific DWH-modeling concepts
Data warehousing in practice
Drs. S.F.J Otten
12-05-2015
2. Topics
About me…
Business Intelligence
What is a Data warehouse (DWH)
DWH – Design strategies
Data-modeling
Brief history in data modeling
Star-schematic
Snowflake-schematic
Datavault
Anchormodeling
Pratical examples
Summary
3. About me…
Education
Highschool (MAVO)
College (MBO ICT lvl.4)
Univeristy of Applied
sciences (Avans
Hogeschool, Business
Informatics; BSc)
Utrecht University (MBI;
MSc)
Utrecht University (PhD)
Carreer till now..
Kadenza (privatly held (80
employees) (2014 –
present)
BI-consultant/architect
(Microsoft BI stack)
CSB-System BV/GmbH
(privatly held, 500-1000
employees globally)
(2010-2014)
BI-consultant/architect
(Microsoft BI stack)
Lead
programmingdepartment
for BI at HQ
Semantic development
4. Business Intelligence
Business Intelligence??
“a way for organizations to understand their internal and external
environment through the systematic acquisition, collation, analysis,
interpretation and exploitation of information” (Watson & Wixom,
2007).
5. What is a Data warehouse (1)
Data warehouse?? (DWH)
“a repository where all data relevant to the
management of an organization is stored and from
which knowledge emerges.” (March & Hevner, 2007)
“A data warehouse is a subject-oriented, integrated,
time-variant, nonvolatile collection of data in support
of management’s decision-making process.” (Inmon, 1992)
Different definitions same goal;
provide data in such a way that it has meaning and
can be used in all levels of an organization as input for
a decision-making-process
6. DWH – design strategies (1)
Enterprise wide DWH-design (Imnon, 2002)
DWH is designed by using a normalized enterprise
data model From the EDWH data marts for specific
business domains are derived
Data mart design (Kimball, 2002)
Hybrid strategy (top-down & bottom-up) for DWH-
design
Create datamarts in a bottom-up fasion
Datamart-design conforms to a top-down
skeleton/framwork-design which is called the
“data warehouse bus”
The EDW = the union of the conformed datamarts
11. Data-modeling – Star/SF - concepts
Concepts
Star-/snowflake-schematic Golfarelli, M., Maio, D., & Rizzi, S. (1998)
Fact-table A fact is a focus of interest for the
decision-making process; typically, it
models an
event occurring in the enterprise world
(e.g., sales and shipments)
Dimension-table Dimensions are discrete
attributes which determine the minimum
granularity adopted to represent facts;
typical dimensions for the sale fact are
product, store and date
Hierarchy Discrete dimension attributes linked by -
to-one relationships, and determine how
facts may be aggregated and selected
significantly for the decision-making
process.
12. Data-modeling - star-schematic
• Comprises of a
single fact-table
• Has N-
dimension-
tables
• Each tuple in the
fact-table has a
pointer (FK) to
each of the
dimension-
tables
• Each dimension-
table has
columns that
correspond to
attributes of the
specific
dimensions(Chaud
huri & Dayal, 1997)
13. Data-modeling - snowflake-schematic
• A normalized
star-
schematic
(3NF)
• Dimensions
are split up in
to sub
dimensions
• Lesser FK’s
in fact-table
• Easier
maintenance
14. Data-modeling –Star/SF - ETL
• Conventional
DWH-architecture
(Star-/SF-
schematic) for
populating a DWH
• RFC has a high
impact on existing
ETL-
practice/package
and DWH (i.e.
request for a new
metric) = re-
engineering
• Introduction of a
new IT-system
causes serious
rework and
headaches
15. Data-modeling – Star/SF – ETL - P.O.A
Two types of ETL:
FULL ETL
Complete transfer of all data in source-systems via ETL-
packages
Incremental ETL
After FULL ETL , incremental ETL determines the delta and
loads it into the DWH. The loading can be :
INSERT records that are not present in the DWH
UPDATE records that have changed values in certain
columns
o Requires UPDATE-statements need to take into
account the keys (primary and foreign) that uniquely
identify a record in a table and execute the UPDATE-
statement); risky if not entirely clear what the unique
identifier is.
16. Data-modeling – Star/SF – Case (1)
DWH = Snowflake-architecture (3NF)
Dimension-tables (DimItem,DimInvoice)
Fact-table (FactSalesStatistics)
ETL comprises a FULL and INCREMENTAL-load
Client A sends an RFC for an addition in the sales-
overview.
Addition = metric “NetValue” per item per invoice
Additional req= metric “NetValue” is present for future
data and also for data allready residing in the sales-
overview
How would you guys, as future Business-/Technical-
consultants / researches approach this case??
17. Data-modeling – Star/SF – Case (2)
Solution
Identify column containing metric “NetValue” in the source-
system (requires in-depth analysis of transactional system)
Add column to fact table “FactSalesStatistics” ([NetValue]
[decimal] (x,y) NULL)
Revert to appropriate ETL-package;
Adjust the source-query / source-columns to include the identified
column (metric)
Adjust the function that determines the Delta (add identified column)
Adjust the INSERT-command to write the value from the identified
source-column metric “NetValue” in fact-table “FactSalesStatistics”
Adjust the UPDATE-command to update the metric “NetValue” with the
value from the identified source-column for the existing data in table
“FactSalesStatistics”
VALIDATE…VALIDATE…VALIDATE…the ERP-data and
DWH-data (especially in the beginning)
18. Data-modeling – Star/SF – Case (3)
Introduce the new metric in your Sales-cube
Refresh the data source / data source view to get the metric
“NetValue” in the cube-server-environment
Add measure simply by adding the metric in a measuregroup
in the sales-cube
Process the cube and the metric should be available for all
end-users
19. Data-modeling – Datavault - Concepts
Concepts
Data vault (DV) Lindstedt, D., & Graziano, K. (2011)
Data vault The Data Vault is a detail oriented,
historical tracking and uniquely linked set
of normalized tables that support one or
more functional areas of business. It is
scalable and flexible
Hub The Hub is intended to represent major
identifiable concepts-entities of interest
from the real world. It is required that
every Hub entity can be denoted by a
unique identifier
Link The Link represents relationship among
Concepts. Both, Hubs and Links may be
involved in such relationships
Satellite The Satellite is used to associate a Hub
(or a Link) with (a data model) attribute
20. Data-modeling – Datavault - Schematic
• Comprises of
N-Hub-/Link-
/Satellite-
tables
• Scalable/Flexi
ble
• 100% of the
data, 100% of
the time
• Fairly new to
DWH-world
• Used by large
organizations
(i.e. D.O.D,
ABN-AMRO)
21. Data-modeling – Datavault - ETL
• Datavault-
ETL-
architecture
for populating
a datavault.
• RFC has no
impact on
existing ETL-
practice/packa
ge and DWH;
no re-
engineering
• Introduction of
new IT-system
does not
cause
headaches
22. Data-modeling – Datavault – ETL – P.O.A
Two types of ETL:
FULL ETL
Complete transfer of all data in source-systems via ETL-packages
Decomposition of existing tables in to Hubs, Links, and Satellites
Incremental ETL
After FULL ETL , incremental ETL determines the delta and
loads it into the DWH. The loading can be :
INSERT records that are not present in the DWH
END-DATING records that are not valid anymore
There is no UPDATING of metric columns in Datavault.
Only an End-date update is required
23. Data-modeling – Datavault – Case (1)
DWH = Datavault-architecture
Hub-tables (H_Product,H_Customer,H_Order)
Link-tables (L_SalesOrder)
Satellite-tables
(S_Product_1,S_SalesOrder_1,S_Customer_1)
ETL comprises a FULL and INCREMENTAL-load
Client A sends an RFC for an addition in the sales-
overview.
Addition = metric “NetValue” per item per order
Additional req= metric “NetValue is present for future data and
also for data allready residing in the sales-overview
How would you guys, as future Business-/Technical-
consultants / researches approach this case??
24. Data-modeling – Datavault – Case (2)
Solution
Identify column containing metric “NetValue” in the source-system
(requires in-depth analysis of transactional system)
Create a new table in the DWH called S_SalesOrder_2
(ProductID,CustomerID,OrderID,LoadDate,NetValue,MD5,Source,EndDa
te)
Create a new ETL-package
Provide the source-query/ source-columns including the new metric
“NetValue”
Create the function that determines the Delta (Keyfields &identified
column)
Create the INSERT-command to write the value from the identified
source-column metric “NetValue” in satellite S_SalesOrder_2 with
additional values for
“ProductID,CustomerID,OrderID,LoadDate,MD5,Source)
Optional: Create EndDate-function (with the help of staging-tables)
VALIDATE…VALIDATE…VALIDATE…the ERP-data and DWH-data
(especially in the beginning)
26. Data-modeling – Datavault – Case (4)
Datavault does not store data in a structure that is suited
for usage in a datacube.
A datacube needs a Star-/SF-schematic. Hence, data
marts or a “Business vault” is created.
introducting new data in the cube, by using a data mart, is
the same as for a Star-/SF-schematic DWH
27. Data-modeling – Anchormodeling -
concepts
Concepts
Anchor modeling (AM) Rönnbäck (2010)
Anchor modeling Anchor modeling is an agile information
modeling technique that offers non-
destructive extensibility mechanisms.
Anchor An anchor represents a set of entities.
Attribute Attributes are used to represent
properties of anchors
Tie tie represents an association between
two or more anchor entities and optional
knot entities
Knot knot is used to represent a fixed,
typically small, set of entities that do not
change over time
28. Data-modeling – anchormodeling -
schematic
• 6NF-modeling
• Assumption
of AM is that
data changes
over time
• Future proof
• Evolution of
data model is
done through
extensions
• Modulair
• Agile
• Bottom up
29. Data-modeling – anchormodeling - ETL
ETL-procedure has many similarities with DV-ETL-ing
In DV first the HUBS are filled, followed by the LINKS and to
finish it of the SATELLITES are filled.
With AM at first the ANCHORS are populated, followed by
the TIES and ATTRIBUTES
In addition a metadata-repository is filled with each ETL-run
Like DV, there are only INSERT-statements and END-
DATING-procedures.
NO UPDATE-statement
DELETE-statement is only performed when errornous data is
loaded for a given batch
30. Data-modeling – anchormodeling – ETL –
P.O.A
In an ANCHOR only the surrogate key is stored. While with
DV in a HUB the surrogate key and businesskey are
stored together
How is this resolved in an ETL-environment?
Well, the same as to populate a HUB in DV but with an
additional step.
Additional attributes can be loaded in parallel like in DV.
For each of those attributes the surrogatekey is resolved
by referencing the businesskey-attribute.
34. Summary (1)
Two main DWH-design-strategies
Enterprise wide DWH-design
DWH is designed by using a normalized enterprise data model
From the EDWH data marts for specific business domains are
derived
Data mart Design
Create datamarts in a bottom-up fasion
Datamart-design conforms to a top-down skeleton/framwork-
design which is called the “data warehouse bus”
The EDW = the union of the conformed datamarts
35. Summary (2)
Four main Data-modeling-techniques
Star-/Snowflake were introduced in the 80’s
Star-/Snowflake requires re-engineering when introducing new
metrics or systems at the source (ETL/DWH). High impact
Not Agile, specs are determined beforehand, traditional way of
system development deliver results slow hard to expand
existing
Datavault / anchor-modeling introduced in early/mid 00’s
Flexible, Scalable data-model, requires no re-engineering when
introducing new metrics or systems at the source (ETL/DWH),
simply extend/expand. Little to no impact
Agile fast developemt track due to iterative development start
small deliver results fast Expand Scale without effort
36. Summary (3)
So, which data-modeling technique comes out as the
winner…
Well, None, they can co-exist and you should choose the
one that is suited for your needs,demands, skillset etc.
It is merely a tool for acieving your goal
BI-schematic overview (classic)
Focus on DWH this presentation (RED)
DWH = the core of a BI-environment. If data is not stored properly it could have serious concequenses for both the back-end and the front-end (business perspective; wrong numbers wrong decisions etc.)
DWH-adjustment has effect on Extract Transform Load (ETL) and Presentation/Analytics (cube structure, report-definition etc)
Comparing both schematics, each has its advantages and disadvantages. The star schematic is a simple design which is fast to set up, easy to use and more suitable for browsing dimensional tables due to its denormalized structure and therefore is often used in DW design. However, the possibility for inefficiencies due to a higher risk of redundancy is present (Chaudhuri & Dayal, 1997). The snowflake schematic requires more design time, takes longer to set up, but due to its normalized structure it allows for a decrease in redundancy and thereby the possible removal of inefficiencies resulting in performance enhancements.
DELTA can be determined by hashvalues or MD5-checksums based on certain attributes and do a look-up.
UPDATE = (1) Delete OLD-data, (2) INSERT NEW-data expensive on the resources, takes quite some time when handlig large datasets, error-sensitive
SF-architecture
Two dim-tables; one facttable
Delta-function() uses a hashvalue-component in ETL-package for determining the delta
UPDATE-statement is error prone. If one does not know for sure how to uniquely identify a record the possibility exists that multiple records are wrongfully updated with new data.
consequence wrong data wrong information wrong knowledge wrong input in strategic desicion process wrong choices
Who’s tom blame? Yep, the IT-guy who created it. There goes your reputation and good will from the client.
Datavault-principles:
100% of the data, 100% of the time (no filtereing on datasource or aggregations)
Flexible (extendibility with ease; introduction of new data into the datavault does not affect allready present datavault-structure)
Scalable (due to its structure datavault is scalable over multiple servers without any problems and it can grow rapidly in size)
(i.e. Used by D.O.D. due to its flexibily and scalability (3 PetaBytes of data))
Sequence of data load per business-entity:
Hubs
Links
Sattelites
With recent sql-servers one is able to use lead() and LAG() functions to determine end-date. This eliminates the need for an update-statement for END-dating
Anchormodeling is the latest addition to the data-modeling-schematics for a DWH.
Just like Datavault it is very flexible and scalable. However, the decomposition of operational entities is even higher than in data vault and has strict modeling rules:
Each attribute for an “anchor” or “tie” is stored in its own table
Strict naming conventions for anchors, attributes, ties and knots
AM obligates you to set-up metadata-repository
Anchor only contains a surrogate key
NULL-elimination
If you have any questions afterwards please feel free to contact me.