Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data  Warehousing  2016
Kent  Graziano
Senior  Technical  Evangelist
2
Agenda
• Bio
• Data  Warehousing:  Historical  Theory
• Data  Warehousing:  The  Reality
• Data  Warehousing:  The  Futu...
3
My  Bio
• Senior  Technical  Evangelist,  Snowflake  Computing
• Oracle  ACE  Director  (DW/BI)
• Certified  Data  Vault...
4
What  about  you?
• Survey  says…
Theoretical  Architectures
6
“A subject-oriented, integrated, time-variant,
non-volatile collection of data in support of
management’s decision makin...
7
Data  Warehouse
• What  is  it
• Centralized  location  for  data  
• “Single  source  of  truth”  or
• “Single  source ...
8
Datamarts
• What  are  they
• Databases  used  to  provide  fast,  
independent  access  to  a  subset  of  data
• Often...
9
Data  sources
Traditional
• OLTP  databases
• Oracle,  Sybase,  DB2,  SQL  
Server,  MySQL,  Postgres,  …
• Enterprise  ...
10
Transformation  (ETL)
• What  is  it
• Getting  data  from  source  form  
into  a  standard,  clean,  
normalized  for...
11
Direct  Data  Mart
Sales
Data Mart
Financial
Data Mart
Customer
Service
Data Mart
Source 1
Source 2
Source 3
Transforma...
12
Source 1
Source 2
Source 3
Sales
Data Mart
Financial
Data Mart
Customer
Service
Data Mart
Enterprise
Data
Warehouse
ETL...
13
Information  Workshop
Meta  Data  Management
Operation  &  
Administration
Library  &  Toolbox Workbench
Change  
Manag...
14
DW  2.0tm
• Next  Generation  data  warehouse  architecture  from  Bill  Inmon
• Superseded  CIF  (for  some)
• Include...
15
DW  2.0tm
16
Data  Vault
• Invented  and  Developed  by  Daniel  Linstedt
• New,  hybrid  modeling  for  enterprise  date  
warehous...
17
Data  Vault  Definition
The  Data  Vault  is  a  detail  oriented,  historical  tracking  and  uniquely  
linked  set  ...
18
Where  does  a  Data  Vault  Fit?
©  LearnDataVault.com
19
Data  Vault:  3  Simple  Structures
©  LearnDataVault.com
20
Standard  Data  Vault  Model
• Hub:  List  of  UNIQUE  business  keys.
• Link: List  of  UNIQUE  relationships
• Satell...
21
Data  Vault  Extensibility
Adding new  components  to  
the  EDW  has  NEAR  ZERO  
impact  to:
• Existing  Loading  
P...
Back  in  the  Real  World
23
What  a  Data  Warehouse  Isn’t?
• A  panacea
• An  IT  department  endeavor  alone
• Time  to  avoid  user  and  IT  c...
24
ETL
Typical  DW/BI  environment
EDW
Data  sources
Hadoop
Datamarts BI  /  Analytics
OLTP  
databases
Enterprise  
appli...
25
Lots  of  Hybrids
• Most  organizations  mix  Inmon &  Kimball
• ODS  feeding  Data  Marts
• Data  Marts  backed  into ...
26
COMN
Stage
<Full  copies  of  
source  data  
structures  with  
additional  
plumbing  
fields  to  
facilitate  
capt...
27
HI  Stage
COMN
Stage
FIN  Stage FIN
Presentation
HI  
Presentation
COMN
Presentation
Hoped  for  Schema  Architecture  ...
28
HI  Stage
COMN
Stage
FIN  Stage
FIN
Presentation
HI  
Presentation
COMN
Presentation
Actual  Schema  Architecture
Sourc...
The  Future
30
Today’s  realities
Data  diversity
External  data,  machine-­generated  
data,  streaming  data
Complexity
Complex  sys...
31
Current  architectures  can’t  keep  up
Data  Warehousing
• Complex: manage  hardware,   data  
distribution,  indexes,...
32
Next  Generation  – Extended  Data  Warehouse  
Architecture  (XDW)
Traditional  EDW
environment
Investigative  computi...
33
What  we  need  to  solve  for
• Cost  Containment!
• More  data  all  the  time  &  more  complexity
• Hard  to  keep ...
34
New  possibilities  with  the  cloud
• More  &  more  data  “born  in  the  cloud”
• Natural  integration  point  for  ...
35
What  is  Snowflake?
All-­new  SQL  data  
warehouse
No  legacy  code  or  constraints
Delivered  as  a  service
Infras...
36
Our  vision:
Reinvent  the  Data  Warehouse
Data  Warehousing…
• SQL  relational  database
• Optimized  storage  &  pro...
37
Brings  together  diverse  data
Apple 101.12 250 FIH-­2316
Pear 56.22 202 IHO-­6912
Orange 98.21 600 WHQ-­6090
{ "first...
38
Designed  for  the  cloud
Low-­cost,  scalable  
cloud  storage
Never  worry  about  sizing  
for  storage  again
Elast...
39
A  new  architecture:  
multi-­cluster,  shared  data
• Standard  interfaces
• Cloud  services  layer  
coordinates  ac...
40
Enabling  multi-­dimensional  scaling
• Elastic  scaling  for  storage
Low-­cost  cloud  storage,  fully  
replicated  ...
41
Delivered  as  a  service:
no  infrastructure,  knobs,  or  tuning
Infrastructure  
management
Virtual  hardware  and  ...
42
Fits  with  existing  tools  &  processes
Complex  Data  Infrastructure
Complex  systems,  data  pipelines,  
data  sil...
Conclusions?
44
What  Have  We  Learned  Over  The  Years?
• Need  results  soon
• Multi-­years  projects  not  acceptable  any  more
•...
45
Critical  Success  Factors
• A  data  warehouse  will  be  considered  a  success  if  it:
• Can  be  loaded  in  a  ti...
46
An  Option  to  Consider…
Snowflake  is:
• …a  team  of  accomplished  data  experts
• Funded  by  top-­tier  VCs  incl...
47
Available  on
Amazon.com
http://www.amazon.com
/Better-­Data-­Modeling-­
Introduction-­
Engineering-­
ebook/dp/B018BREV...
48
Kent Graziano
Snowflake Computing
Kent.graziano@snowflake.net
On  Twitter  @KentGraziano
More  info  at
http://snowflak...
Data Warehousing 2016
Upcoming SlideShare
Loading in …5
×

Data Warehousing 2016

6,030 views

Published on

These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.

Published in: Data & Analytics

Data Warehousing 2016

  1. 1. Data  Warehousing  2016 Kent  Graziano Senior  Technical  Evangelist
  2. 2. 2 Agenda • Bio • Data  Warehousing:  Historical  Theory • Data  Warehousing:  The  Reality • Data  Warehousing:  The  Future • Closing  Thoughts
  3. 3. 3 My  Bio • Senior  Technical  Evangelist,  Snowflake  Computing • Oracle  ACE  Director  (DW/BI) • Certified  Data  Vault  Master  and  DV  2.0  Practitioner • Former  Member:  Boulder  BI  Brain  Trust  (#BBBT) • Member:  DAMA  International • Data  Architecture  and  Data  Warehouse  Specialist • 30+  years  in  IT • 25+  years  of  Oracle-­related  work • 20+  years  of  data  warehousing  experience • Co-­Author  of   • The  Business  of  Data  Vault  Modeling   • The  Data  Model  Resource  Book  (1st  Edition) • Blogger  – The  Data  Warrior • Past-­President  of    ODTUG  and  Rocky  Mountain  Oracle   User  Group  
  4. 4. 4 What  about  you? • Survey  says…
  5. 5. Theoretical  Architectures
  6. 6. 6 “A subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process.” W.H. Inmon “The data warehouse is where we publish used data.” Ralph Kimball What  Is  a  Data  Warehouse?
  7. 7. 7 Data  Warehouse • What  is  it • Centralized  location  for  data   • “Single  source  of  truth”  or • “Single  source  of  Facts” • Source  of  data  for  reporting,   analytics,  and  offline  operational   processes • Who  is  it • Capital  ‘EDW’:   • Primary:  Teradata,  Oracle  Exadata,   IBM  Pure  Systems,  … • Secondary:  HP  Vertica,  Pivotal   Greenplum • “Data  warehouse”:  SQL  Server,   MySQL,  Oracle,  … Proprietary  and  Confidential
  8. 8. 8 Datamarts • What  are  they • Databases  used  to  provide  fast,   independent  access  to  a  subset  of  data • Often  created  for  departments,   projects,  users,  … • Comparison  to  data  warehouse • Similar  technology • Subset  of  data • Relieves  pressure  on  EDW • Provides  “sandbox”  for  analysis  /   analysts Proprietary  and  Confidential
  9. 9. 9 Data  sources Traditional • OLTP  databases • Oracle,  Sybase,  DB2,  SQL   Server,  MySQL,  Postgres,  … • Enterprise  applications • ERP,  CRM,  HR,  … • Traditional  third-­party  data • Consumer  databases,  stock   trade  data,  … Non-­traditional • Web  applications • Website  applications,  mobile   applications,  … • New  third-­party  data • API  data,  Twitter,  Facebook,   Segment,  weather,  … • Other • Sensors,  devices,  … Proprietary  and  Confidential
  10. 10. 10 Transformation  (ETL) • What  is  it • Getting  data  from  source  form   into  a  standard,  clean,   normalized  form • How  it  gets  done • Third-­party  tools • Custom  home-­grown  scripts • Hadoop Proprietary  and  Confidential
  11. 11. 11 Direct  Data  Mart Sales Data Mart Financial Data Mart Customer Service Data Mart Source 1 Source 2 Source 3 Transformation Routines (ETL)
  12. 12. 12 Source 1 Source 2 Source 3 Sales Data Mart Financial Data Mart Customer Service Data Mart Enterprise Data Warehouse ETL Routines ETL Routines Basic  “Inmon”  Architected  Data  Warehouse
  13. 13. 13 Information  Workshop Meta  Data  Management Operation  &   Administration Library  &  Toolbox Workbench Change   Management Service   Management Data  Acquisition   Management Systems   Management Data   Acquisition CIF  Data   Management Data   Delivery Information  Feedback API API API API DSI DSI TrI DSI DSI Operational   Systems Operational Data  Store Data   Warehouse Exploration   Warehouse Data  Mining   Warehouse OLAP  Data   Mart Oper  Mart External ERP Internet Legacy Other ©  2002,  Intelligent  Solutions,  Inc. Corporate  Information  Factory Courtesy  of  Intelligent  Solutions,  Inc.  
  14. 14. 14 DW  2.0tm • Next  Generation  data  warehouse  architecture  from  Bill  Inmon • Superseded  CIF  (for  some) • Includes  more  accommodation  and  integration  of  meta  data • Includes  integration  of  “unstructured”  data
  15. 15. 15 DW  2.0tm
  16. 16. 16 Data  Vault • Invented  and  Developed  by  Daniel  Linstedt • New,  hybrid  modeling  for  enterprise  date   warehousing • Introduced  with  TDAN  articles  in  2002 • Truly  introduces  an  approach  for  agile,  incremental   dw model  development • Called  “hyper  normalized”  by  some • Methodology  adapted  from  Scott  Ambler’s   Disciplined  Agile  Development  (DAD)
  17. 17. 17 Data  Vault  Definition The  Data  Vault  is  a  detail  oriented,  historical  tracking  and  uniquely   linked  set  of  normalized  tables  that  support  one  or  more  functional   areas  of  business.     It  is  a  hybrid  approach  encompassing  the  best  of  breed  between  3rd normal  form  (3NF)  and  star  schema.    The  design  is  flexible,  scalable,   consistent  and  adaptable  to  the  needs  of  the  enterprise.     Dan Linstedt: Defining the Data Vault TDAN.com Article Architected  specifically  to  meet  the  needs   of  today’s  enterprise  data  warehouses
  18. 18. 18 Where  does  a  Data  Vault  Fit? ©  LearnDataVault.com
  19. 19. 19 Data  Vault:  3  Simple  Structures ©  LearnDataVault.com
  20. 20. 20 Standard  Data  Vault  Model • Hub:  List  of  UNIQUE  business  keys. • Link: List  of  UNIQUE  relationships • Satellite: Historical  descriptive  data. Email  ID Sat Sat Sat Link Bank  ID Sat Sat Sat Passenger ID Sat Sat Sat F(x) Email  Information Bank  Transactions Airline  Reservations Sat Link Records a history of the interaction ** Dashed Line is a possible New Relationship Hub Satellite
  21. 21. 21 Data  Vault  Extensibility Adding new  components  to   the  EDW  has  NEAR  ZERO   impact  to: • Existing  Loading   Processes • Existing  Data  Model • Existing  Reporting  &  BI   Functions • Existing  Source  Systems • Existing  Star  Schemas   and  Data  Marts (C)  LearnDataVault.com
  22. 22. Back  in  the  Real  World
  23. 23. 23 What  a  Data  Warehouse  Isn’t? • A  panacea • An  IT  department  endeavor  alone • Time  to  avoid  user  and  IT  communications • The  sure-­fire  way  to  reduce  overhead  and  increase  company  /   department  profits • The  answer  to  all  decision  support  and  reporting  needs • “Just  a  reporting  data  base”
  24. 24. 24 ETL Typical  DW/BI  environment EDW Data  sources Hadoop Datamarts BI  /  Analytics OLTP   databases Enterprise   applications Web   applications Third-­party Other Proprietary  and  Confidential
  25. 25. 25 Lots  of  Hybrids • Most  organizations  mix  Inmon &  Kimball • ODS  feeding  Data  Marts • Data  Marts  backed  into  an  EDW • Off  the  Shelf  models  – customized  to  work! • Canned  BI  apps   • Oracle  BI  Apps • Data  Vaults  inside  a  CIF • Some  using  Hadoop  for  Staging • etc
  26. 26. 26 COMN Stage <Full  copies  of   source  data   structures  with   additional   plumbing   fields  to   facilitate   capturing   subsequent   data  changes   over  time> COMN Presentation Example:Hybrid -­Original  Schema  Architecture Source(s) of  Record ReportingMSH  EDW COMN   Integration <Enterprise   business  key   model  with   key  mapping   pointers  to   COMN_STG   data  > JIT   Transformation <Virtual  v.   Physical> G2 MU HI KDW CI  SAS  Routines EDW  V1 FDW  /  PMS KDW  Lite Lynx SFDC   BOBJ Δ CDC Insert 1X only Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Star Schema(s) Data Marts Web TBLU
  27. 27. 27 HI  Stage COMN Stage FIN  Stage FIN Presentation HI   Presentation COMN Presentation Hoped  for  Schema  Architecture  (Parallel  Loads) Source(s) of  Record BOBJ  /  BI  /   ReportingMSH  EDW COMN  Validation COMN   Integration FIN HI CLIN G2 MU HI KDW CI  SAS  Routines EDW  V1 FDW  /  PMS KDW  Lite Lynx SFDC   MKTG
  28. 28. 28 HI  Stage COMN Stage FIN  Stage FIN Presentation HI   Presentation COMN Presentation Actual  Schema  Architecture Source(s) of  Record BOBJ  /  BI  /   ReportingMSH  EDW COMN  Validation   (DQ) COMN   Integration FIN HI CLIN G2 MU HI KDW CI  SAS  Routines EDW  V1 FDW  /  PMS KDW  Lite Lynx SFDC   MKTG
  29. 29. The  Future
  30. 30. 30 Today’s  realities Data  diversity External  data,  machine-­generated   data,  streaming  data Complexity Complex  systems,  data  pipelines,   data  silos Barriers  to  analytics Incomplete  data,  slow  time  to  access,   performance  and  concurrency  barriers EDW Datamarts Hadoop
  31. 31. 31 Current  architectures  can’t  keep  up Data  Warehousing • Complex: manage  hardware,   data   distribution,  indexes,  … • Limited  elasticity: forklift  upgrades,  data   redistribution,  downtime • Costly:  overprovisioning,  significant  care  &   feeding Hadoop • Complex: specialized  skills,  new  tools • Limited  elasticity: data  redistribution,   resource  contention • Not  a  data  warehouse:  batch-­oriented,   limited  optimization,  incomplete  security
  32. 32. 32 Next  Generation  – Extended  Data  Warehouse   Architecture  (XDW) Traditional  EDW environment Investigative  computing platform Data refinery Data  integration platform   Analytic  tools  &  applications Operational  real-­time   environment RT  analysis  platform Other  internal  &  external structured  &  multi-­structured  data Real-­time   streaming  data Operational  systems RT  BI  services Slide  created  by  Colin  White  – BI  Research,  Inc. Copyright  Intellegent Solutions,  Inc 2105.   All  Rights  Reserved.  Used  by  Permission
  33. 33. 33 What  we  need  to  solve  for • Cost  Containment! • More  data  all  the  time  &  more  complexity • Hard  to  keep  up  infrastructure  &  skills • Quicker  time  to  delivery • See  the  data  sooner! • Elasticity • On  demand  resources • True  “grid”  utility  computing • Security
  34. 34. 34 New  possibilities  with  the  cloud • More  &  more  data  “born  in  the  cloud” • Natural  integration  point  for  data • Low-­cost,  scalable  storage • Capacity  on  demand
  35. 35. 35 What  is  Snowflake? All-­new  SQL  data   warehouse No  legacy  code  or  constraints Delivered  as  a  service Infrastructure,  resiliency,   optimization  built  in Designed  for  the  cloud Running  in  Amazon  Web   Services
  36. 36. 36 Our  vision: Reinvent  the  Data  Warehouse Data  Warehousing… • SQL  relational  database • Optimized  storage  &  processing • Standard  connectivity  – BI,  ETL,  … …for  Everyone • Existing  SQL  skills  and  tools • “Load  and  go”  ease  of  use • Cloud-­based  elasticity  to  fit  any  scale Data   scientists SQL   users  &   tools
  37. 37. 37 Brings  together  diverse  data Apple 101.12 250 FIH-­2316 Pear 56.22 202 IHO-­6912 Orange 98.21 600 WHQ-­6090 { "firstName": "John", "lastName": "Smith", "height_cm": 167.64, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ] } Structured data (e.g. CSV) Semi-structured data (e.g. JSON, Avro, XML) • Optimized storage • Flexible schema • Relational processing
  38. 38. 38 Designed  for  the  cloud Low-­cost,  scalable   cloud  storage Never  worry  about  sizing   for  storage  again Elastic  compute,  on   demand Exact  amount  of   compute  needed,  exactly   when  needed Optimized  for   diverse  data Load  and  optimize  semi-­ structured  +  structured   data  without   transformation Software  as  a   service No  knobs,  tuning,  or   infrastructure   management
  39. 39. 39 A  new  architecture:   multi-­cluster,  shared  data • Standard  interfaces • Cloud  services  layer   coordinates  across  service • Independent  compute   clusters  access  data • Data  centralized  in  enterprise-­ class  cloud  storage
  40. 40. 40 Enabling  multi-­dimensional  scaling • Elastic  scaling  for  storage Low-­cost  cloud  storage,  fully   replicated  and  resilient • Elastic  scaling  for  compute Virtual  warehouses  scale  up  &   down  on  the  fly  to  support   workload  needs • Elastic  scaling  for  concurrency Scale  concurrency  using   independent  virtual  warehouses Finance Marketing Operations Loading  /   ETL Sales Test  /  Dev
  41. 41. 41 Delivered  as  a  service: no  infrastructure,  knobs,  or  tuning Infrastructure   management Virtual  hardware  and   software  managed  by   Snowflake Metadata   management Automatic  statistics   collection,  scaling,  and   redundancy **.. **.. Manual  query   optimization Dynamic  optimization,   parallelization,  and   concurrency  management Data  storage   management Adaptive  data  distribution,   automatic  compression,   automatic  optimization
  42. 42. 42 Fits  with  existing  tools  &  processes Complex  Data  Infrastructure Complex  systems,  data  pipelines,   data  silos EDW Datamarts Hadoop Data  Diversity  Challenges External  data,  machine-­generated   data,  streaming  data Barriers  to  Analysis Analysis  limited  by  incomplete  data,   delays  in  access,  performance   limitations
  43. 43. Conclusions?
  44. 44. 44 What  Have  We  Learned  Over  The  Years? • Need  results  soon • Multi-­years  projects  not  acceptable  any  more • Executive  buy  in  ($$$) • Build  incrementally,  test,  refactor • Get  user  feedback  RIGHT  AWAY! • Avoid  over  analysis • You  will  learn  as  you  go  
  45. 45. 45 Critical  Success  Factors • A  data  warehouse  will  be  considered  a  success  if  it: • Can  be  loaded  in  a  timely  manner • Regardless  of  the  data  type  or  source • Can  be  accessed  in  an  easy  fashion • By  both  data  scientists  and  business  users • Can  be  understood  by  the  business  community • Is  recognized  as  bringing  value  to  the  decision  making   process • For  an  acceptable  TCO
  46. 46. 46 An  Option  to  Consider… Snowflake  is: • …a  team  of  accomplished  data  experts • Funded  by  top-­tier  VCs  including  Altimeter  Capital,  Redpoint Ventures,   Sutter  Hill  Ventures,  Wing  VC …who  have  developed  a  completely  new  data  warehouse   designed  for  the  cloud • Data  warehouse  as  a  service • Multidimensional  elasticity • Support  for  all  business  data  – including  semi-­structured • Compelling  price:performance
  47. 47. 47 Available  on Amazon.com http://www.amazon.com /Better-­Data-­Modeling-­ Introduction-­ Engineering-­ ebook/dp/B018BREV1C/   SHAMELESS  PLUG:
  48. 48. 48 Kent Graziano Snowflake Computing Kent.graziano@snowflake.net On  Twitter  @KentGraziano More  info  at http://snowflake.net Visit  my  blog  at http://kentgraziano.com Contact  Information

×