Agile Data Warehousing: Using SDDM to Build a Virtualized ODS


Published on

(This is the talk I gave at Houston DAMA and Agile Denver BI meetups)
At a past client, in order to meet timelines to fulfill urgent, unmet reporting needs, I found it necessary to build a virtualized Operational Data Store as the first phase of a new Data Vault 2.0 project. This allowed me to deliver new objects, quickly and incrementally to the report developer so we could quickly show the business users their data. In order to limit the need for refactoring in later stages of the data warehouse development, I chose to build this virtualization layer on top of a Type 2 persistent staging layer. All of this was done using Oracle SQL Developer Data Modeler (SDDM) against (gasp!) a MS SQL Server Database. In this talk I will show you the architecture for this approach, the rationale, and then the tricks I used in SDDM to build all the stage tables and views very quickly. In the end you will see actual SQL code for a virtual ODS that can easily be translated to an Oracle database.

Published in: Data & Analytics
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Agile Data Warehousing: Using SDDM to Build a Virtualized ODS

  2. 2. Agenda ©  Data  Warrior  LLC Bio Architecture  and  Approach › What  is  a  Virtualized  ODS? Using  SDDM  for  pattern-­based  stage   tables Using  views  to  load  the  stage  tables › Building  the  views  in  SDDM › Using  MD5  columns  for  Change  Data   Capture Building  ODS  views  in  SDDM › Using  Analytic  Functions  in  views Generating  the  DDL › SQL  Server › Oracle 1 2 3 4 5 6
  3. 3. My  Bio ©  Data  Warrior  LLC › Senior  Technical  Evangelist,  Snowflake  Computing › Oracle  ACE  Director  (BI/DW) › Certified  Data  Vault  Master  and  DV  2.0  Practitioner › Data  Modeling,  Data  Architecture  and  Data  Warehouse   Specialist › 30+  years  in  IT › 25+  years  of  Oracle-­related  work › 20+  years  of  data  warehousing  experience › Former-­Member:   Boulder  BI  Brain  Trust   ( › Author  &  Co-­Author  of  a  bunch  of  books › Blogger:  The  Data  Warrior › Past-­President  of  Oracle  Development  Tools  User   Group  and  Rocky  Mountain  Oracle  User  Group  
  4. 4. Shameless  Plug ©  Data  Warrior  LLC Available  on­Data-­Modeling-­ Enhancing-­Developer-­ebook/dp/B00UK75LYI/
  5. 5. Shameless  Plug  #2:  Also  On ©  Data  Warrior  LLC NOW  IN   SPANISH TOO!­Doing-­ Design-­Reviews-­ ebook/dp/B008RG9L5E/ I%C3%93N-­REALIZAR-­ REVISIONES-­DISE%C3%91OS-­ MODELOS-­ ebook/dp/B00NUS1GFM/
  6. 6. Architecture  &  Approach ©  Data  Warrior  LLC Goals › New  reporting  environment › Agile  (i.e.,  quick  delivery) › Future  Proof Determination › Use  Data  Vault  2.0 › Implement  in  Phases
  7. 7. Data  Vault  Definition ©  Data  Warrior  LLC The  Data  Vault  is  a  detail  oriented,  historical  tracking  and  uniquely   linked  set  of  normalized  tables  that  support  one  or  more  functional   areas  of  business.     It  is  a  hybrid  approach  encompassing  the  best  of  breed  between  3rd normal  form  (3NF)  and  star  schema.  The  design  is  flexible,  scalable,   consistent  and  adaptable  to  the  needs  of  the  enterprise. Architected  specifically  to  meet  the  needs  of  today’s  enterprise   data  warehouses DAN  LINSTEDT:  Defining  the  Data  Vault  Article
  8. 8. Data  Vault  Components ©  Data  Warrior  LLC Copyright  2011  Dan  Linstedt,  used  by  permission
  9. 9. Phase  1:  Operational  BI ©  Data  Warrior  LLC Goals:   1. Support  immediate  business  needs  for  operational  reports   2. Provides  architectural  component  (stage  layer)  that  supports  long  term  data  warehouse  (DW)  framework 3. Can  be  easily  enhanced  to  accommodate  information  needs  of  other  departments 4. Foundation  for  eliciting  solid  analytic  BI  requirements XLink (data  source) eRMS (data  source) DW  Stage  Layer Virtual  Operational   Data  Store  (ODS) BOBJ  Operational   Universe(s) BOBJ  Operational     Reports
  10. 10. Phase  1:  Operational  BI ©  Data  Warrior  LLC Data  Warehouse  (DW)  Stage  Layer › Based  on  source  system  structures › May  simply  be  replicated  source  tables › Refreshed  several  times  a  day › Perform  change  data  capture  in  this  layer  to   provide  persistent,  historical  data  for  future   reporting  needs Virtual  Operational  Data  Store  (ODS) › Abstraction  layer  between  source  and  report  tool › Views  on  stage  layer  initially › Provides  proper  modeling  for  building  the   Operational  Universe(s)  for  BI  report  tool › Includes  Business  Names  and  Joins
  11. 11. Phase  2:  Analytic  BI ©  Data  Warrior  LLC Goals:   1. Provide  foundation  for  long  term  analytics  platform  (single    source  of  information) 2. Create  purpose-­built  Universe  for  analytic  needs 3. Enable  managed  self-­service  BI  by  making  it  simpler  for  users  to  find  the  reports  they  need XLink (data  source) eRMS (data  source) DW  Stage  Layer Virtual  ODS BOBJ  Operational   Universe(s) BOBJ  Operational     Reports Data  Vault   (Enterprise  DW) Virtual  Data  Marts BOBJ  Analytics   Universe(s) BOBJ  Analytical   Reports  &  Dashboards
  12. 12. Phase  2:  Analytic  BI ©  Data  Warrior  LLC Data  Vault › Provides  one  consistent  source  of  information   for  both  operational  and  analytic  information › Source  system  agnostic  structures   › Easier  to  adapt  and  extend  in  future  than  3NF   or  star  schema › Can  be  easily  expanded  as  new  data  is  added   to  the  data  warehouse  foundation  layer   › Persistent,  historical  capture  of  transaction-­ level  data › Allows  meeting  future  unknown  needs,  as  they  arise   Addition  of  Data  Vault  should  be   transparent  to  BOBJ  operational   report  users › Modification  to  physical  references  in  the   universe  hides  the  change  from  the  users;;   Operational  universe  still  looks  like  “modified”   source  system  structures › Therefore,  no  rework  of  existing  reports
  13. 13. Phase  2:  Analytic  BI ©  Data  Warrior  LLC › Virtual  data  marts  also  sourced  from  data   vault › Marts  provide  an  abstraction  layer  between   DW  and  Business  Objects › Can  be  easily  expanded  as  new  data  is  added  to  the  Data   Vault › Easy  to  create  new  data  marts  for  future  business  needs   Analytics  universe(s)  sourced  from  new  virtual  data  marts   › Looks  like  proper  star  schema  with  facts  and   dimensions › Re-­organizes  the  data  to  more  effectively  support   business  reporting › Enables  long-­term  universe  support  by  most  common   BOBJ  development  skill  set › Can  be  converted  to  physical  data  mart  (if   needed) › For  performance  in  a  future  release › For  highly  complex  business  rules
  14. 14. Building  Pattern-­Based  Stage  Tables ©  Data  Warrior  LLC Create  Table   Template › Include  reusable  meta   data  columns Reverse  Engineer   Source  Table(s) › Copy  and  rename Apply  Template › Use  built  in   transformation  script › Alternative › Copy  template  table › Merge  with  copy  of   source Re-­order  columns   as  needed
  15. 15. Template  Table ©  Data  Warrior  LLC
  16. 16. Create  Base  Stage  Table ©  Data  Warrior  LLC 01. Copy  source   table 02. Rename   (add  _stg) 03. Remove  source   indexes 04. Change  schema   assignment 05. Add  or  Change  table   comment 06. Assign  Stage  classification   (if  you  have  one) 07. NOTE:  You  could   script  all  this!
  17. 17. Base  Stage  From  Source ©  Data  Warrior  LLC
  18. 18. Apply  Table  Template  Transform ©  Data  Warrior  LLC Use  Table  Template  and  Transformation  Script Tools  -­>  Design  Rules  -­>  Custom  Transformations Look  for  “table  template”  delivered  script › No  change  needed Create  table  called  table_template (or  change  script) › With  required  columns  and  properties  to  be  copied Select  “Apply” › Changes  all  tables  in  design Note:  can  script  all  sorts  of  stuff › Check  /datamodeler/xmlmetadata/doc 1 2 3 4 5 6
  19. 19. ©  Data  Warrior  LLC #VirtualODS
  20. 20. Use  the  Merge  Tool Alternate  -­ Merging  Tables ©  Data  Warrior  LLC Adding  Standard  Columns › 5th  button  on  tool  bar › Good  for  building  denormalized  reporting  tables › Also  for  one-­offs  to  add  standard  columns Combines  Two  Tables › Click  merge  button,  then  template,  then   target › Edit  result  as  needed a. Copy template  table b. Merge  with  table  needing  the   columns
  21. 21. ©  Data  Warrior  LLC MERGED TABLES Merge  button Merged  tables
  22. 22. Finalize  Stage  Table  Design ©  Data  Warrior  LLC 01 02 03 Re-­order  columns › PRIM_KEY  column  is  1st Add  new  PK  constraint  using  PRIM_KEY  column Drop  source  PK  constraint › Replace  with  Unique  constraint
  23. 23. BUILDING  THE  STAGE  TABLE  WITH  MERGE DEMO! ©  Data  Warrior  LLC
  24. 24. Final  DW  Stage  Table ©  Data  Warrior  LLC Source  table  name  +   stg  suffix New  calculated  PK  for  each   stage  record Indicator  of  original   source  system  PK Additional  meta-­data  columns   to  support  change  capture,   load  time  and  source
  25. 25. Build  Stage  Load  Views ©  Data  Warrior  LLC For  db to  db ELT  type  loading Includes  code  for  Type  2  SCD  style  CDC Use  SDDM  View  Builder › Select  from  source  table  (all  columns) › Drag  and  drop › Alternate  – Table  to  View  wizard › Add  code  from  view  template   Show  code  in  DDL  Preview Test  in  SQL  Developer › Fix › Repeat 1 2 3 4 5
  26. 26. Table  To  View  Wizard ©  Data  Warrior  LLC Pick   Tables  to  use Auto  create  new   subview diagram Auto  add  PK  &  FK  to  views   based  on  base  table
  27. 27. View  Builder ©  Data  Warrior  LLC Pick   Syntax Pick   Tables  &  Columns Add  Calcs &  Aliases   &  Filters Add  Complex  Sub   queries  if  needed
  28. 28. MD5  Keys  &  Columns ©  Data  Warrior  LLC Concatenate  source  data  fields  and  hash  to  create  MD5  keys  &  columns MD5  Key  Types 1 2 PRIM_KEY: › All  source  fields  (in  table   order)  +  LOAD_DTS › Uniquely  ID’s  all  records   with  DW › Can  serve  as  an  SCD-­2   key  in  virtual  Dim’s  /   Facts HASH_KEY: › Source  field(s)  (in  table   order)  used  by  SOR  to  ID   data  rows  uniquely  for   change  data  capture   purposes HASH_DIFF: › All  non-­CDC_KEY source   fields  (in  table  order)  to   track  deltas  for  change  data   capture  purposes
  29. 29. MD5-­Based  Change  Detection ©  Data  Warrior  LLC Think  Type  2  SCD  (Slowly  Changing  Dimensions) Old  Way: › Compare  column  by  column › Source  value  !=  Current  value  in  DW  table › 20  columns,  then  20  compares New  Way: › Concatenate  all  columns  to  one  string › Convert  to  one  char(32)  string  with  hash  function › Compare  to  hashed  value  (HASH_DIFF)  in  target  table › Does  not  matter  how  many  columns
  30. 30. What  Does  It  Look  Like? ©  Data  Warrior  LLC Encode  using  standard  MD5  hash  function   (Oracle) › rawtohex(sys.utl_raw.cast_to_raw(dbms_obfuscation_toolkit. md5  (input_string =>  ...) Need  to  minimize  chance  of  duplicates › 12||3||45  and  1||2||345  hash  to  same  value › Need  a  separator  between  each › Also  handles  case  of  null  values › Example:  Col1||’^’||Col2||’^’||Col3
  31. 31. Other  Considerations ©  Data  Warrior  LLC To  generate  most  consistent  string:  standardize! Convert  data  types If  'NUMBER',  'NVARCHAR2',  'NVARCHAR',  'NCHAR‘ › THEN  'TO_CHAR('  ||  column_name ||  ')‘ If    'RAW‘ › THEN  'ENC_BASE64('  ||  column_name ||  ')‘ If  'DATE‘ › THEN  'TO_CHAR('  ||  column_name ||  ',  ''YYYY-­MM-­DD'')‘ If  LIKE  'TIME%‘ › THEN  'TO_CHAR('  ||  column_name ||  ',  ''YYYY-­MM-­DD  HH24:MI:SS'')'
  32. 32. Template  View  Code  – SQL  Server ©  Data  Warrior  LLC -­-­ SQL  Server  load  view  template  columns   PRIM_KEY,  -­-­ place  holder  for  PK  column HASH_KEY,  -­-­ place  holder  for  HASH  Key HASH_DIFF,    -­-­ place  holder  for  CDC  column GETDATE()  AS  LOAD_DTS,  -­-­ current  data  and  time 'eRMS'        AS  REC_SRC  – a  source  system  name -­-­ Template  Where WHERE  -­-­supports  load  new  keys  and  changes,  no  dups NOT  EXISTS (  SELECT    1 FROM  dw_stage.rmcodp_stg stg WHERE   stg.HASH_KEY =  upper(CONVERT([Char](32),HASHBYTES('MD5',   UPPER(RTRIM(RMC.CODCODTYP)  +  '^'  +  RTRIM(RMC.CODCODNUM)  +  '^')),2))  AND   stg.HASH_DIFF =  upper(CONVERT([Char](32),  HASHBYTES('MD5',   UPPER(RTRIM(CONVERT([Char](100),RMC.CODKEYNUM))  +  '^'  +  )    …
  33. 33. Virtual  ODS ©  Data  Warrior  LLC Simple  database  views  on  stage  tables.   Tables  and  columns  renamed  with  business  terms FK  Added  to   help    BOBJ   Developer   define  proper   joins
  34. 34. Defining  The  Virtual  ODS  Views ©  Data  Warrior  LLC Start  with  Table  to  View  Wizard › On  Stage  Tables Rename  view Used  Excel  &  Metadata  to  create  column  alias › Extract  metadata  for  stage  tables  (use  SDDM  Search) › Add  calculated  column  to  Excel   › ="RMO."&E10350&"  AS  "&M10350&"," › Cut  and  paste  into  View  Builder Add  nested  table  with  analytic  function › To  only  return  current  rows  for  ODS
  35. 35. Generating  the  Column  Aliases ©  Data  Warrior  LLC
  36. 36. Analytic  Function  To  Get  Current  Rows ©  Data  Warrior  LLC SELECT CONVERT([Char](10),RMC.CODCODNUM)  AS  Business_Group_Code, RMC.CODKEYNUM                                       AS  Code_Key_Numeric, RMC.CODSYSTYP                                          AS  System_Value_Type, RMC.CODLNGDES                                        AS  Description, … RMC.LOAD_DTS                                            AS  LOAD_DTS, CASE WHEN  RANK()  OVER  (PARTITION  BY  RMC.HASH_KEY   ORDER  BY  RMC.LOAD_DTS  DESC)      =  1 THEN  'Y' ELSE  'N' END  CURR_FLG FROM DW_STAGE.RMCODP_STG  RMC WHERE RMC.CODCODTYP  =  'BG‘  
  37. 37. BUT…  Can’t  Use  Function  In  Where ©  Data  Warrior  LLC 01. Have  to  nest  the   query  with  the   function  as  a   virtual  table  in  the   FROM 02. Then  use   CURR_FLAG  in   outer  WHERE 03. Works  in  Oracle,   SQL  Server,  and   SnowflakeDB 04. Drop  the  final   query  into  View   Builder › Save › Generate  DDL
  38. 38. Example:  Virtual  ODS  View ©  Data  Warrior  LLC SELECT SRC.Business_Group_Code, SRC.Code_Key_Numeric, SRC.System_Value_Type, … SRC.Change_Time, SRC.LOAD_DTS FROM ( SELECT CONVERT([Char](10),RMC.CODCODNUM)  AS  Business_Group_Code, RMC.CODKEYNUM                                        AS  Code_Key_Numeric, RMC.CODSYSTYP                                          AS  System_Value_Type, … RMC.CODCHGTIM                                      AS  Change_Time, RMC.LOAD_DTS                                            AS  LOAD_DTS, CASE WHEN  RANK()  OVER  (PARTITION  BY  RMC.HASH_KEY   ORDER  BY  RMC.LOAD_DTS  DESC)    =  1 THEN  'Y' ELSE  'N' END  CURR_FLG  –-­ calculated  column FROM DW_STAGE.RMCODP_STG  RMC WHERE RMC.CODCODTYP  =  'BG' )    SRC  –-­ nested  virtual  table WHERE SRC.CURR_FLG  =  'Y'  –filter  on  calculated  column Nested  Virtual  Table w/Rank  column  and   other  transforms Get  current  rows using  virtual  column Main  select  for   view  columns #VirtualODS
  39. 39. Virtual  ODS  View  In  Query  Builder  -­ Nested ©  Data  Warrior  LLC #VirtualODS
  40. 40. Virtual  ODS  View  In  Query  Builder  -­ Outer ©  Data  Warrior  LLC #VirtualODS
  41. 41. Generate  DDL ©  Data  Warrior  LLC Use  DDL  Preview   to  check File  > Export  >  DDL Or  click  the  DDL  Icon Pick  the  target  DB   type Can  switch  at   generate  time Same  design  can  generate   Oracle  and  SQL  Server
  42. 42. Generate  DDL ©  Data  Warrior  LLC
  43. 43. Generate  DDL ©  Data  Warrior  LLC
  44. 44. Conclusion ©  Data  Warrior  LLC With  planning  and  good  architecture  you   can  be  agile Data  Vault  provides  a  good  framework Oracle  Data  Modeler  provides  the  tool Think  out  of  the  box › Start  with  virtual  ODS  or  Data  Marts › Support  for  both  Oracle  &  SQL  Server › And  Snowflake  too! 1 2 3 4
  45. 45. Want  More  In  Depth  Training? ©  Data  Warrior  LLC SQL  Developer  Data  Modeler  Jumpstart Online  video  training  class  with  demos Discount  code  GRAZIANO10S (20%off) Go  to
  46. 46. ©  Data  Warrior  LLC AVAILABLE NOW…. › On › Covers  a  ton  of  stuff › Reviewed  by  Kent  &  Jeff!
  47. 47. ©  Data  Warrior  LLC SUPER  CHARGE   YOUR  DATA   WAREHOUSE › Available  on › Soft  Cover  or  Kindle  Format › Now  also  available  in  PDF  at › Hint:  Kent  is  the  Technical  Editor
  48. 48. ©  Data  Warrior  LLC New  DV  2.0  Book   (includes  more  details  on  MD5) › Available  on  Amazon:­ Scalable-­Data-­Warehouse-­ Vault/dp/0128025107/
  49. 49. ©  Data  Warrior  LLC QUESTIONS?
  50. 50. CONTACT INFORMATION KENT  GRAZIANO Snowflake  Computing @KentGraziano