More Related Content Similar to Agile Data Warehousing: Using SDDM to Build a Virtualized ODS (20) Agile Data Warehousing: Using SDDM to Build a Virtualized ODS2. Agenda
© Data Warrior LLC
Bio
Architecture and Approach
› What is a Virtualized ODS?
Using SDDM for pattern-based stage
tables
Using views to load the stage tables
› Building the views in SDDM
› Using MD5 columns for Change Data
Capture
Building ODS views in SDDM
› Using Analytic Functions in views
Generating the DDL
› SQL Server
› Oracle
1
2
3
4
5
6
3. My Bio
© Data Warrior LLC
› Senior Technical Evangelist, Snowflake Computing
› Oracle ACE Director (BI/DW)
› Certified Data Vault Master and DV 2.0 Practitioner
› Data Modeling, Data Architecture and Data Warehouse
Specialist
› 30+ years in IT
› 25+ years of Oracle-related work
› 20+ years of data warehousing experience
› Former-Member: Boulder BI Brain Trust
(http://www.boulderbibraintrust.org/)
› Author & Co-Author of a bunch of books
› Blogger: The Data Warrior
› Past-President of Oracle Development Tools User
Group and Rocky Mountain Oracle User Group
4. Shameless Plug
© Data Warrior LLC
Available on
Amazon.com
http://www.amazon.com/Better-Data-Modeling-
Enhancing-Developer-ebook/dp/B00UK75LYI/
5. Shameless Plug #2: Also On Amazon.com
© Data Warrior LLC
NOW IN
SPANISH
TOO!
http://www.amazon.com/Check-Doing-
Design-Reviews-
ebook/dp/B008RG9L5E/
http://www.amazon.com/VERIFICAC
I%C3%93N-REALIZAR-
REVISIONES-DISE%C3%91OS-
MODELOS-
ebook/dp/B00NUS1GFM/
6. Architecture & Approach
© Data Warrior LLC
Goals
› New reporting environment
› Agile (i.e., quick delivery)
› Future Proof
Determination
› Use Data Vault 2.0
› Implement in Phases
7. Data Vault Definition
© Data Warrior LLC
The Data Vault is a detail oriented, historical tracking and uniquely
linked set of normalized tables that support one or more functional
areas of business.
It is a hybrid approach encompassing the best of breed between 3rd
normal form (3NF) and star schema. The design is flexible, scalable,
consistent and adaptable to the needs of the enterprise.
Architected specifically to meet the needs of today’s enterprise
data warehouses
DAN LINSTEDT: Defining the Data Vault
TDAN.com Article
9. Phase 1: Operational BI
© Data Warrior LLC
Goals:
1. Support immediate business needs for operational reports
2. Provides architectural component (stage layer) that supports long term data warehouse (DW) framework
3. Can be easily enhanced to accommodate information needs of other departments
4. Foundation for eliciting solid analytic BI requirements
XLink
(data source)
eRMS
(data source)
DW Stage Layer
Virtual Operational
Data Store (ODS)
BOBJ Operational
Universe(s)
BOBJ Operational
Reports
10. Phase 1: Operational BI
© Data Warrior LLC
Data Warehouse (DW) Stage Layer
› Based on source system structures
› May simply be replicated source tables
› Refreshed several times a day
› Perform change data capture in this layer to
provide persistent, historical data for future
reporting needs
Virtual Operational Data Store (ODS)
› Abstraction layer between source and report tool
› Views on stage layer initially
› Provides proper modeling for building the
Operational Universe(s) for BI report tool
› Includes Business Names and Joins
11. Phase 2: Analytic BI
© Data Warrior LLC
Goals:
1. Provide foundation for long term analytics platform (single source of information)
2. Create purpose-built Universe for analytic needs
3. Enable managed self-service BI by making it simpler for users to find the reports they need
XLink
(data source)
eRMS
(data source)
DW Stage Layer
Virtual ODS
BOBJ Operational
Universe(s)
BOBJ Operational
Reports
Data Vault
(Enterprise DW)
Virtual Data Marts
BOBJ Analytics
Universe(s)
BOBJ Analytical
Reports & Dashboards
12. Phase 2: Analytic BI
© Data Warrior LLC
Data Vault
› Provides one consistent source of information
for both operational and analytic information
› Source system agnostic structures
› Easier to adapt and extend in future than 3NF
or star schema
› Can be easily expanded as new data is added
to the data warehouse foundation layer
› Persistent, historical capture of transaction-
level data
› Allows meeting future unknown needs, as they arise
Addition of Data Vault should be
transparent to BOBJ operational
report users
› Modification to physical references in the
universe hides the change from the users;;
Operational universe still looks like “modified”
source system structures
› Therefore, no rework of existing reports
13. Phase 2: Analytic BI
© Data Warrior LLC
› Virtual data marts also sourced from data
vault
› Marts provide an abstraction layer between
DW and Business Objects
› Can be easily expanded as new data is added to the Data
Vault
› Easy to create new data marts for future business needs
Analytics universe(s) sourced from new virtual data marts
› Looks like proper star schema with facts and
dimensions
› Re-organizes the data to more effectively support
business reporting
› Enables long-term universe support by most common
BOBJ development skill set
› Can be converted to physical data mart (if
needed)
› For performance in a future release
› For highly complex business rules
14. Building Pattern-Based Stage Tables
© Data Warrior LLC
Create Table
Template
› Include reusable meta
data columns
Reverse Engineer
Source Table(s)
› Copy and rename
Apply Template
› Use built in
transformation script
› Alternative
› Copy template table
› Merge with copy of
source
Re-order columns
as needed
16. Create Base Stage Table
© Data Warrior LLC
01.
Copy source
table
02.
Rename
(add _stg)
03.
Remove source
indexes
04.
Change schema
assignment
05.
Add or Change table
comment
06.
Assign Stage classification
(if you have one)
07.
NOTE: You could
script all this!
18. Apply Table Template Transform
© Data Warrior LLC
Use Table Template and Transformation Script
Tools -> Design Rules -> Custom Transformations
Look for “table template” delivered script
› No change needed
Create table called table_template (or change script)
› With required columns and properties to be copied
Select “Apply”
› Changes all tables in design
Note: can script all sorts of stuff
› Check /datamodeler/xmlmetadata/doc
1
2
3
4
5
6
20. Use the Merge Tool
Alternate - Merging Tables
© Data Warrior LLC
Adding Standard Columns
› 5th button on tool bar
› Good for building denormalized reporting tables
› Also for one-offs to add standard columns
Combines Two Tables
› Click merge button, then template, then
target
› Edit result as needed
a. Copy template table
b. Merge with table needing the
columns
22. Finalize Stage Table Design
© Data Warrior LLC
01
02
03
Re-order columns
› PRIM_KEY column is 1st
Add new PK constraint using PRIM_KEY column
Drop source PK constraint
› Replace with Unique constraint
24. Final DW Stage Table
© Data Warrior LLC
Source table name +
stg suffix
New calculated PK for each
stage record
Indicator of original
source system PK
Additional meta-data columns
to support change capture,
load time and source
25. Build Stage Load Views
© Data Warrior LLC
For db to db ELT type loading
Includes code for Type 2 SCD style CDC
Use SDDM View Builder
› Select from source table (all columns)
› Drag and drop
› Alternate – Table to View wizard
› Add code from view template
Show code in DDL Preview
Test in SQL Developer
› Fix
› Repeat
1
2
3
4
5
26. Table To View Wizard
© Data Warrior LLC
Pick
Tables to use
Auto create new
subview diagram
Auto add PK & FK to views
based on base table
27. View Builder
© Data Warrior LLC
Pick
Syntax
Pick
Tables & Columns
Add Calcs & Aliases
& Filters
Add Complex Sub
queries if needed
28. MD5 Keys & Columns
© Data Warrior LLC
Concatenate source data fields and hash to create MD5 keys & columns
MD5 Key Types
1
2
PRIM_KEY:
› All source fields (in table
order) + LOAD_DTS
› Uniquely ID’s all records
with DW
› Can serve as an SCD-2
key in virtual Dim’s /
Facts
HASH_KEY:
› Source field(s) (in table
order) used by SOR to ID
data rows uniquely for
change data capture
purposes
HASH_DIFF:
› All non-CDC_KEY source
fields (in table order) to
track deltas for change data
capture purposes
29. MD5-Based Change Detection
© Data Warrior LLC
Think Type 2 SCD (Slowly Changing Dimensions)
Old Way:
› Compare column by column
› Source value != Current value in DW table
› 20 columns, then 20 compares
New Way:
› Concatenate all columns to one string
› Convert to one char(32) string with hash function
› Compare to hashed value (HASH_DIFF) in target table
› Does not matter how many columns
30. What Does It Look Like?
© Data Warrior LLC
Encode using standard MD5 hash function
(Oracle)
› rawtohex(sys.utl_raw.cast_to_raw(dbms_obfuscation_toolkit.
md5 (input_string => ...)
Need to minimize chance of duplicates
› 12||3||45 and 1||2||345 hash to same value
› Need a separator between each
› Also handles case of null values
› Example: Col1||’^’||Col2||’^’||Col3
31. Other Considerations
© Data Warrior LLC
To generate most consistent string: standardize!
Convert data types
If 'NUMBER', 'NVARCHAR2', 'NVARCHAR', 'NCHAR‘
› THEN 'TO_CHAR(' || column_name || ')‘
If 'RAW‘
› THEN 'ENC_BASE64(' || column_name || ')‘
If 'DATE‘
› THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD'')‘
If LIKE 'TIME%‘
› THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD HH24:MI:SS'')'
32. Template View Code – SQL Server
© Data Warrior LLC
-- SQL Server load view template columns
PRIM_KEY, -- place holder for PK column
HASH_KEY, -- place holder for HASH Key
HASH_DIFF, -- place holder for CDC column
GETDATE() AS LOAD_DTS, -- current data and time
'eRMS' AS REC_SRC – a source system name
-- Template Where
WHERE --supports load new keys and changes, no dups
NOT EXISTS
( SELECT 1
FROM dw_stage.rmcodp_stg stg
WHERE
stg.HASH_KEY = upper(CONVERT([Char](32),HASHBYTES('MD5',
UPPER(RTRIM(RMC.CODCODTYP) + '^' + RTRIM(RMC.CODCODNUM) + '^')),2)) AND
stg.HASH_DIFF = upper(CONVERT([Char](32), HASHBYTES('MD5',
UPPER(RTRIM(CONVERT([Char](100),RMC.CODKEYNUM)) + '^' + ) …
33. Virtual ODS
© Data Warrior LLC
Simple database views on stage tables.
Tables and columns renamed with business terms
FK Added to
help BOBJ
Developer
define proper
joins
34. Defining The Virtual ODS Views
© Data Warrior LLC
Start with Table to View Wizard
› On Stage Tables
Rename view
Used Excel & Metadata to create column alias
› Extract metadata for stage tables (use SDDM Search)
› Add calculated column to Excel
› ="RMO."&E10350&" AS "&M10350&","
› Cut and paste into View Builder
Add nested table with analytic function
› To only return current rows for ODS
36. Analytic Function To Get Current Rows
© Data Warrior LLC
SELECT
CONVERT([Char](10),RMC.CODCODNUM) AS Business_Group_Code,
RMC.CODKEYNUM AS Code_Key_Numeric,
RMC.CODSYSTYP AS System_Value_Type,
RMC.CODLNGDES AS Description,
…
RMC.LOAD_DTS AS LOAD_DTS,
CASE
WHEN RANK() OVER (PARTITION BY RMC.HASH_KEY
ORDER BY RMC.LOAD_DTS DESC) = 1
THEN 'Y'
ELSE 'N'
END CURR_FLG
FROM
DW_STAGE.RMCODP_STG RMC
WHERE
RMC.CODCODTYP = 'BG‘
37. BUT… Can’t Use Function In Where
© Data Warrior LLC
01.
Have to nest the
query with the
function as a
virtual table in the
FROM
02.
Then use
CURR_FLAG in
outer WHERE
03.
Works in Oracle,
SQL Server, and
SnowflakeDB
04.
Drop the final
query into View
Builder
› Save
› Generate DDL
38. Example: Virtual ODS View
© Data Warrior LLC
SELECT
SRC.Business_Group_Code,
SRC.Code_Key_Numeric,
SRC.System_Value_Type,
…
SRC.Change_Time,
SRC.LOAD_DTS
FROM
(
SELECT
CONVERT([Char](10),RMC.CODCODNUM) AS Business_Group_Code,
RMC.CODKEYNUM AS Code_Key_Numeric,
RMC.CODSYSTYP AS System_Value_Type,
…
RMC.CODCHGTIM AS Change_Time,
RMC.LOAD_DTS AS LOAD_DTS,
CASE
WHEN RANK() OVER (PARTITION BY RMC.HASH_KEY
ORDER BY RMC.LOAD_DTS DESC) = 1
THEN 'Y'
ELSE 'N'
END CURR_FLG –- calculated column
FROM
DW_STAGE.RMCODP_STG RMC
WHERE
RMC.CODCODTYP = 'BG'
) SRC –- nested virtual table
WHERE
SRC.CURR_FLG = 'Y' –filter on calculated column
Nested Virtual Table
w/Rank column and
other transforms
Get current rows
using virtual column
Main select for
view columns
#VirtualODS
41. Generate DDL
© Data Warrior LLC
Use DDL Preview
to check
File > Export > DDL
Or click the DDL Icon
Pick the target DB
type
Can switch at
generate time
Same design can generate
Oracle and SQL Server
44. Conclusion
© Data Warrior LLC
With planning and good architecture you
can be agile
Data Vault provides a good framework
Oracle Data Modeler provides the tool
Think out of the box
› Start with virtual ODS or Data Marts
› Support for both Oracle & SQL Server
› And Snowflake too!
1
2
3
4
45. Want More In Depth Training?
© Data Warrior LLC
SQL Developer Data Modeler Jumpstart
Online video training class with demos
Discount code GRAZIANO10S (20%off)
Go to
https://kentgraziano.com/sddm1/
46. © Data Warrior LLC
AVAILABLE NOW….
› On Amazon.com
› Covers a ton of stuff
› Reviewed by Kent & Jeff!
47. © Data Warrior LLC
SUPER CHARGE
YOUR DATA
WAREHOUSE
› Available on Amazon.com
› Soft Cover or Kindle Format
› Now also available in PDF at
LearnDataVault.com
› Hint: Kent is the Technical Editor
48. © Data Warrior LLC
New DV 2.0 Book
(includes more details on MD5)
› Available on Amazon:
http://www.amazon.com/Building-
Scalable-Data-Warehouse-
Vault/dp/0128025107/