SlideShare a Scribd company logo
Building an Effective Data
Warehouse Architecture
James Serra, Big Data Evangelist
Microsoft
May 7-9, 2014 | San Jose, CA
Other Presentations
 Building an Effective Data Warehouse Architecture
Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)
 Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud and MPP)
Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in
 Finding business value in Big Data (What exactly is Big Data and why
should I care?)
Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects
 How does Microsoft solve Big Data?
Covers the Microsoft products that can be used to create a Big Data solution
 Modern Data Warehousing with the Microsoft Analytics Platform System
The next step in data warehouse performance is APS, a MPP appliance
 Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc
Deep dives into the various Microsoft Big Data related products
About Me
 Business Intelligence Consultant, in IT for 28 years
 Microsoft, Big Data Evangelist
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW developer
 Been perm, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference and PASS Summit
 MCSE for SQL Server 2012: Data Platform and BI
 Blog at JamesSerra.com
 SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
I tried to build a data warehouse on my own…
And ended up passed-out drunk in a Denny’s
parking lot
Let’s prevent that from happening…
Agenda
 What a Data Warehouse is not
 What is a Data Warehouse and why use one?
 Fast Track Data Warehouse (FTDW)
 Appliances
 Data Warehouse vs Data Mart
 Kimball and Inmon Methodologies
 Populating a Data Warehouse
 ETL vs ELT
 Surrogate Keys
 SSAS Cubes
 Modern Data Warehouse
What a Data Warehouse is not
• A data warehouse is not a copy of a source database with the name prefixed with “DW”
• It is not a copy of multiple tables (i.e. customer) from various sources systems unioned
together in a view
• It is not a dumping ground for tables from various sources with not much design put into it
Data Warehouse Maturity Model
Courtesy of Wayne Eckerson
What is a Data Warehouse and why use one?
A data warehouse is where you store data from multiple data sources to be used for historical and
trend analysis reporting. It acts as a central repository for many subject areas and contains the
"single version of truth". It is NOT to be used for OLTP applications.
Reasons for a data warehouse:
 Reduce stress on production system
 Optimized for read access, sequential disk scans
 Integrate many sources of data
 Keep historical records (no need to save hardcopy reports)
 Restructure/rename tables and fields, model data
 Protect against source system upgrades
 Use Master Data Management, including hierarchies
 No IT involvement needed for users to create reports
 Improve data quality and plugs holes in source systems
 One version of the truth
 Easy to create BI solutions on top of it (i.e. SSAS Cubes)
Why use a Data Warehouse?
Legacy applications + databases = chaos
Production
Control
MRP
Inventory
Control
Parts
Management
Logistics
Shipping
Raw Goods
Order Control
Purchasing
Marketing
Finance
Sales
Accounting
Management
Reporting
Engineering
Actuarial
Human
Resources
Continuity
Consolidation
Control
Compliance
Collaboration
Enterprise data warehouse = order
Single version
of the truth
Enterprise Data
Warehouse
Every question = decision
Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
Hardware Solutions
 Fast Track Data Warehouse - A reference configuration optimized for data warehousing. This
saves an organization from having to commit resources to configure and build the server
hardware. Fast Track Data Warehouse hardware is tested for data warehousing which
eliminates guesswork and is designed to save you months of configuration, setup, testing
and tuning. You just need to install the OS and SQL Server
 Appliances - Microsoft has made available SQL Server appliances (SMP and MPP) that allow
customers to deploy data warehouse (DW), business intelligence (BI) and database
consolidation solutions in a very short time, with all the components pre-configured and
pre-optimized. These appliances include all the hardware, software and services for a
complete, ready-to-run, out-of-the-box, high performance, energy-efficient solutions
Data Warehouse Fast Track for SQL
Server 2014
Hardware system design
• Tight specifications for servers, storage,
and networking
• Resource balanced and validated
• Latest-generation servers and storage,
including solid-state disks (SSDs)
Database configuration
• Workload-specific
• Database architecture
• SQL Server settings
• Windows Server settings
• Performance guidance
Software
• SQL Server 2014 Enterprise
• Windows Server 2012 R2
Processors
Networking
Servers
Storage
Windows Server
2012 R2
SQL Server 2014
Options for data warehouse solutions
Balancing flexibility
and choice
By yourself With a reference
architecture
With an appliance
Tuning and optimization
Installation
Configuration
Tuning and optimization
Installation
Configuration
Installation
Tuning and optimization
HIGH
LOW
Time to
solution
Optional, if you have hardware already
Existing or procured
hardware and support
Procured software and
support
Offerings
• SQL Server 2014
• Windows Server 2012 R2
• System Center 2012 SP1
Offerings
• Private Cloud Fast Track
• Data Warehouse Fast Track
Offerings
• Data Warehouse Fast Track
• Analytics Platform System
Existing or procured
hardware and support
Procured software and
support
Procured appliance and
support
HIGH
Price
Data Warehouse Fast Track advantages
Flexibility and ChoiceReduced riskFaster Deployment
Vendors with 2014 Fast Track Appliances
 Dell
 EMC
 HP/ScanDisk
 Lenovo
 NEC
 Tegile
Data Warehouse vs Data Mart
 Data Warehouse: A single organizational repository of enterprise wide data
across many or all subject areas
 Holds multiple subject areas
 Holds very detailed information
 Works to integrate all data sources
 Feeds data mart
 Data Mart: Subset of the data warehouse that is usually oriented to specific
subject (finance, marketing, sales)
• The logical combination of all the data marts is a data warehouse
In short, a data warehouse as contains many subject areas, and a data mart
contains just one of those subject areas
Kimball and Inmon Methodologies
Two approaches for building data warehouses
Kimball and Inmon Myths
 Myth: Kimball is a bottom-up approach without enterprise focus
 Really top-down: BUS matrix (choose business processes/data sources), conformed
dimensions, MDM
 Myth: Inmon requires a ton of up-front design that takes a long time
 Inmon says to build DW iteratively, not big bang approach (p. 91 BDW, p. 21 Imhoff)
 Myth: Star schema data marts are not allowed in Inmon’s model
 Inmon says they are good for direct end-user access of data (p. 365 BDW), good for data
marts (p. 12 TTA)
 Myth: Very few companies use the Inmon method
 Survey had 39% Inmon vs 26% Kimball. Many have a EDW
 Myth: The Kimball and Inmon architectures are incompatible
 They can work together to provide a better solution
Kimball and Inmon Methodologies
 Relational (Inmon) vs Dimensional (Kimball)
 Relational Modeling:
 Entity-Relationship (ER) model
 Normalization rules
 Many tables using joins
 History tables, natural keys
 Good for indirect end-user access of data
 Dimensional Modeling:
 Facts and dimensions, star schema
 Less tables but have duplicate data (de-normalized)
 Easier for user to understand (but strange for IT people used to relational)
 Slowly changing dimensions, surrogate keys
 Good for direct end-user access of data
Relational Model vs Dimensional Model
Relational Model Dimensional Model
If you are a business user, which model is easier to use?
Kimball and Inmon Methodologies
• Kimball:
• Logical data warehouse (BUS), made up of subject areas (data marts)
• Business driven, users have active participation
• Decentralized data marts (not required to be a separate physical data store)
• Independent dimensional data marts optimized for reporting/analytics
• Integrated via Conformed Dimensions (provides consistency across data sources)
• 2-tier (data mart, cube), less ETL, no data duplication
• Inmon:
• Enterprise data model (CIF) that is a enterprise data warehouse (EDW)
• IT Driven, users have passive participation
• Centralized atomic normalized tables (off limit to end users)
• Later create dependent data marts that are separate physical subsets of data and can
be used for multiple purposes
• Integration via enterprise data model
• 3-tier (data warehouse, data mart, cube), duplication of data
Kimball Model
Why staging: Limit source contention (ELT), Recoverability, Backup, Auditing
Inmon Model
Reasons to add a Enterprise Data Warehouse
 Single version of the truth
 May make building dimensions easier using lightly denormalized tables in EDW instead of
going directly from the OLTP source
 Normalized EDW results in enterprise-wide consistency which makes it easier to spawn-off
the data marts at the expense of duplicated data
 Less daily ETL refresh and reconciliation if have many sources and many data marts in
multiple databases
 One place to control data (no duplication of effort and data)
 Reason not to: If have a few sources that need reporting quickly
Which model to use?
 Models are not that different, having become similar over the years, and can compliment
each other
 Boils down to Inmon creates a normalized DW before creating a dimensional data mart and
Kimball skips the normalized DW
 With tweaks to each model, they look very similar (adding a normalized EDW to Kimball,
dimensionally structured data marts to Inmon)
 Bottom line: Understand both approaches and pick parts from both for your situation – no
need to just choose just one approach
 BUT, no solution will be effective unless you possess soft skills (leadership, communication,
planning, and interpersonal relationships)
Hybrid Model
Advice: Use SQL Server Views to interface between each level in the model
In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one
database. Another option is to have each data mart in its own database with all databases on one server or spread among
multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
Kimball Methodology
From: Kimball’s The Microsoft Data Warehouse Toolkit
Kimball defines a development lifecycle, where Inmon is just about the data warehouse (not “how” used)
Populating a Data Warehouse
 Determine frequency of data pull (daily, weekly, etc)
 Full Extraction – All data (usually dimension tables)
 Incremental Extraction – Only data changed from last run (fact tables)
 How to determine data that has changed
 Timestamp - Last Updated
 Change Data Capture (CDC)
 Partitioning by date
 Triggers on tables
 MERGE SQL Statement
 Column DEFAULT value populated with date
 Online Extraction – Data from source. First create copy of source:
 Replication
 Database Snapshot
 Availability Groups
 Offline Extraction – Data from flat file
ETL vs ELT
• Extract, Transform, and Load (ETL)
• Transform while hitting source system
• No staging tables
• Processing done by ETL tools (SSIS)
• Extract, Load, Transform (ELT)
• Uses staging tables
• Processing done by target database engine (SSIS: Execute T-SQL Statement task instead
of Data Flow Transform tasks)
• Use for big volumes of data
• Use when source and target databases are the same
• Use with the Analytics Platform System (APS)
ELT is better since database engine is more efficient than SSIS
• Best use of database engine: Transformations
• Best use of SSIS: Data pipeline and workflow management
Surrogate Keys
Surrogate Keys – Unique identifier not derived from source system
• Embedded in fact tables as foreign keys to dimension tables
• Allows integrating data from multiple source systems
• Protect from source key changes in the source system
• Allows for slowly changing dimensions
• Allows you to create rows in the dimension that don’t exist in the source (-1 in fact table for
unassigned)
• Improves performance (joins) and database size by using integer type instead of text
• Implemented via identity column on dimension tables
SSAS Cubes
Reasons to report off cubes instead of the data warehouse:
 Aggregating (Summarizing) the data for performance
 Multidimensional analysis – slice, dice, drilldown, show details
 Can store Hierarchies
 Built-in support for KPI’s
 Security: You can use the security setting to give end-users access to only those parts (slices)
of the cube relevant to them
 Built-in advanced time-calculations – i.e. 12-month rolling average
 Easily use Excel to view data via Pivot Tables
 Automatically handles Slowly Changing Dimensions (SCD)
 Required for PerformancePoint, Power View, and SSAS data mining
Data Warehouse Architecture
Modern Data Warehouse
Think about future needs:
• Increasing data volumes
• Real-time performance
• New data sources and types (Hadoop)
• Cloud-born data
Solution – Microsoft Analytics Platform System:
• Scalable
• MPP architecture
• HDInsight
• Polybase
Follow-on presentation: “Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud, and MPP)”
Resources
 Data Warehouse Architecture – Kimball and Inmon methodologies: http://bit.ly/SrzNHy
 SQL Server 2012: Multidimensional vs tabular: http://bit.ly/SrzX1x
 Data Warehouse vs Data Mart: http://bit.ly/SrAi4p
 Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6
 Complex reporting off a SSAS cube: http://bit.ly/SrAEYw
 Surrogate Keys: http://bit.ly/SrAIrp
 Normalizing Your Database: http://bit.ly/SrAHnc
 Difference between ETL and ELT: http://bit.ly/SrAKQa
 Microsoft’s Data Warehouse offerings: http://bit.ly/xAZy9h
 Microsoft SQL Server Reference Architecture and Appliances: http://bit.ly/y7bXY5
 Methods for populating a data warehouse: http://bit.ly/SrARuZ
 Great white paper: Microsoft EDW Architecture, Guidance and Deployment Best Practices: http://bit.ly/SrAZug
 End-User Microsoft BI Tools – Clearing up the confusion: http://bit.ly/SrBMLT
 Microsoft Appliances: http://bit.ly/YQIXzM
 Why You Need a Data Warehouse: http://bit.ly/1fwEq0j
 Data Warehouse Maturity Model: http://bit.ly/xl4mGM
 Operational Data Store (ODS) Defined: http://bit.ly/1H6wnE7
 The Modern Data Warehouse: http://bit.ly/1xuX4Py
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck will be posted under the “Presentations” tab)

More Related Content

What's hot

Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
DATAVERSITY
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
Stéphane Fréchette
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
adivasoft
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
Kent Graziano
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
Christopher Bradley
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
Empowered Holdings, LLC
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
Eric Kavanagh
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
SwarnaLatha177
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 

What's hot (20)

Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 

Similar to Building an Effective Data Warehouse Architecture

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
James Serra
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
Harsha Gowda B R
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
Niyitegekabilly
 
DWBASIC.ppt
DWBASIC.pptDWBASIC.ppt
DWBASIC.ppt
ssuserc65885
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
Martin Bém
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Dwh basics datastage online training
Dwh basics datastage online trainingDwh basics datastage online training
Dwh basics datastage online training
Datawarehouse Trainings
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
Datawarehouse Trainings
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
Capgemini
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
James Serra
 
DW 101
DW 101DW 101
DW 101
jeffd00
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
projectandppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
DWH_Session_1.pptx
DWH_Session_1.pptxDWH_Session_1.pptx
DWH_Session_1.pptx
umashanker manthena
 

Similar to Building an Effective Data Warehouse Architecture (20)

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
DWBASIC.ppt
DWBASIC.pptDWBASIC.ppt
DWBASIC.ppt
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28Prague data management meetup 2017-02-28
Prague data management meetup 2017-02-28
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Dwh basics datastage online training
Dwh basics datastage online trainingDwh basics datastage online training
Dwh basics datastage online training
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
DW 101
DW 101DW 101
DW 101
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
DWH_Session_1.pptx
DWH_Session_1.pptxDWH_Session_1.pptx
DWH_Session_1.pptx
 

More from James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
James Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
James Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
How to build your career
How to build your careerHow to build your career
How to build your career
James Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
James Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
James Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 

More from James Serra (20)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 

Building an Effective Data Warehouse Architecture

  • 1. Building an Effective Data Warehouse Architecture James Serra, Big Data Evangelist Microsoft May 7-9, 2014 | San Jose, CA
  • 2. Other Presentations  Building an Effective Data Warehouse Architecture Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)  Building a Big Data Solution (Building an Effective Data Warehouse Architecture with Hadoop, the cloud and MPP) Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in  Finding business value in Big Data (What exactly is Big Data and why should I care?) Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects  How does Microsoft solve Big Data? Covers the Microsoft products that can be used to create a Big Data solution  Modern Data Warehousing with the Microsoft Analytics Platform System The next step in data warehouse performance is APS, a MPP appliance  Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc Deep dives into the various Microsoft Big Data related products
  • 3. About Me  Business Intelligence Consultant, in IT for 28 years  Microsoft, Big Data Evangelist  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE for SQL Server 2012: Data Platform and BI  Blog at JamesSerra.com  SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 4. I tried to build a data warehouse on my own… And ended up passed-out drunk in a Denny’s parking lot Let’s prevent that from happening…
  • 5. Agenda  What a Data Warehouse is not  What is a Data Warehouse and why use one?  Fast Track Data Warehouse (FTDW)  Appliances  Data Warehouse vs Data Mart  Kimball and Inmon Methodologies  Populating a Data Warehouse  ETL vs ELT  Surrogate Keys  SSAS Cubes  Modern Data Warehouse
  • 6. What a Data Warehouse is not • A data warehouse is not a copy of a source database with the name prefixed with “DW” • It is not a copy of multiple tables (i.e. customer) from various sources systems unioned together in a view • It is not a dumping ground for tables from various sources with not much design put into it
  • 7. Data Warehouse Maturity Model Courtesy of Wayne Eckerson
  • 8. What is a Data Warehouse and why use one? A data warehouse is where you store data from multiple data sources to be used for historical and trend analysis reporting. It acts as a central repository for many subject areas and contains the "single version of truth". It is NOT to be used for OLTP applications. Reasons for a data warehouse:  Reduce stress on production system  Optimized for read access, sequential disk scans  Integrate many sources of data  Keep historical records (no need to save hardcopy reports)  Restructure/rename tables and fields, model data  Protect against source system upgrades  Use Master Data Management, including hierarchies  No IT involvement needed for users to create reports  Improve data quality and plugs holes in source systems  One version of the truth  Easy to create BI solutions on top of it (i.e. SSAS Cubes)
  • 9. Why use a Data Warehouse? Legacy applications + databases = chaos Production Control MRP Inventory Control Parts Management Logistics Shipping Raw Goods Order Control Purchasing Marketing Finance Sales Accounting Management Reporting Engineering Actuarial Human Resources Continuity Consolidation Control Compliance Collaboration Enterprise data warehouse = order Single version of the truth Enterprise Data Warehouse Every question = decision Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
  • 10. Hardware Solutions  Fast Track Data Warehouse - A reference configuration optimized for data warehousing. This saves an organization from having to commit resources to configure and build the server hardware. Fast Track Data Warehouse hardware is tested for data warehousing which eliminates guesswork and is designed to save you months of configuration, setup, testing and tuning. You just need to install the OS and SQL Server  Appliances - Microsoft has made available SQL Server appliances (SMP and MPP) that allow customers to deploy data warehouse (DW), business intelligence (BI) and database consolidation solutions in a very short time, with all the components pre-configured and pre-optimized. These appliances include all the hardware, software and services for a complete, ready-to-run, out-of-the-box, high performance, energy-efficient solutions
  • 11. Data Warehouse Fast Track for SQL Server 2014 Hardware system design • Tight specifications for servers, storage, and networking • Resource balanced and validated • Latest-generation servers and storage, including solid-state disks (SSDs) Database configuration • Workload-specific • Database architecture • SQL Server settings • Windows Server settings • Performance guidance Software • SQL Server 2014 Enterprise • Windows Server 2012 R2 Processors Networking Servers Storage Windows Server 2012 R2 SQL Server 2014
  • 12. Options for data warehouse solutions Balancing flexibility and choice By yourself With a reference architecture With an appliance Tuning and optimization Installation Configuration Tuning and optimization Installation Configuration Installation Tuning and optimization HIGH LOW Time to solution Optional, if you have hardware already Existing or procured hardware and support Procured software and support Offerings • SQL Server 2014 • Windows Server 2012 R2 • System Center 2012 SP1 Offerings • Private Cloud Fast Track • Data Warehouse Fast Track Offerings • Data Warehouse Fast Track • Analytics Platform System Existing or procured hardware and support Procured software and support Procured appliance and support HIGH Price
  • 13. Data Warehouse Fast Track advantages Flexibility and ChoiceReduced riskFaster Deployment
  • 14. Vendors with 2014 Fast Track Appliances  Dell  EMC  HP/ScanDisk  Lenovo  NEC  Tegile
  • 15. Data Warehouse vs Data Mart  Data Warehouse: A single organizational repository of enterprise wide data across many or all subject areas  Holds multiple subject areas  Holds very detailed information  Works to integrate all data sources  Feeds data mart  Data Mart: Subset of the data warehouse that is usually oriented to specific subject (finance, marketing, sales) • The logical combination of all the data marts is a data warehouse In short, a data warehouse as contains many subject areas, and a data mart contains just one of those subject areas
  • 16. Kimball and Inmon Methodologies Two approaches for building data warehouses
  • 17. Kimball and Inmon Myths  Myth: Kimball is a bottom-up approach without enterprise focus  Really top-down: BUS matrix (choose business processes/data sources), conformed dimensions, MDM  Myth: Inmon requires a ton of up-front design that takes a long time  Inmon says to build DW iteratively, not big bang approach (p. 91 BDW, p. 21 Imhoff)  Myth: Star schema data marts are not allowed in Inmon’s model  Inmon says they are good for direct end-user access of data (p. 365 BDW), good for data marts (p. 12 TTA)  Myth: Very few companies use the Inmon method  Survey had 39% Inmon vs 26% Kimball. Many have a EDW  Myth: The Kimball and Inmon architectures are incompatible  They can work together to provide a better solution
  • 18. Kimball and Inmon Methodologies  Relational (Inmon) vs Dimensional (Kimball)  Relational Modeling:  Entity-Relationship (ER) model  Normalization rules  Many tables using joins  History tables, natural keys  Good for indirect end-user access of data  Dimensional Modeling:  Facts and dimensions, star schema  Less tables but have duplicate data (de-normalized)  Easier for user to understand (but strange for IT people used to relational)  Slowly changing dimensions, surrogate keys  Good for direct end-user access of data
  • 19. Relational Model vs Dimensional Model Relational Model Dimensional Model If you are a business user, which model is easier to use?
  • 20. Kimball and Inmon Methodologies • Kimball: • Logical data warehouse (BUS), made up of subject areas (data marts) • Business driven, users have active participation • Decentralized data marts (not required to be a separate physical data store) • Independent dimensional data marts optimized for reporting/analytics • Integrated via Conformed Dimensions (provides consistency across data sources) • 2-tier (data mart, cube), less ETL, no data duplication • Inmon: • Enterprise data model (CIF) that is a enterprise data warehouse (EDW) • IT Driven, users have passive participation • Centralized atomic normalized tables (off limit to end users) • Later create dependent data marts that are separate physical subsets of data and can be used for multiple purposes • Integration via enterprise data model • 3-tier (data warehouse, data mart, cube), duplication of data
  • 21. Kimball Model Why staging: Limit source contention (ELT), Recoverability, Backup, Auditing
  • 23. Reasons to add a Enterprise Data Warehouse  Single version of the truth  May make building dimensions easier using lightly denormalized tables in EDW instead of going directly from the OLTP source  Normalized EDW results in enterprise-wide consistency which makes it easier to spawn-off the data marts at the expense of duplicated data  Less daily ETL refresh and reconciliation if have many sources and many data marts in multiple databases  One place to control data (no duplication of effort and data)  Reason not to: If have a few sources that need reporting quickly
  • 24. Which model to use?  Models are not that different, having become similar over the years, and can compliment each other  Boils down to Inmon creates a normalized DW before creating a dimensional data mart and Kimball skips the normalized DW  With tweaks to each model, they look very similar (adding a normalized EDW to Kimball, dimensionally structured data marts to Inmon)  Bottom line: Understand both approaches and pick parts from both for your situation – no need to just choose just one approach  BUT, no solution will be effective unless you possess soft skills (leadership, communication, planning, and interpersonal relationships)
  • 25. Hybrid Model Advice: Use SQL Server Views to interface between each level in the model In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one database. Another option is to have each data mart in its own database with all databases on one server or spread among multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
  • 26. Kimball Methodology From: Kimball’s The Microsoft Data Warehouse Toolkit Kimball defines a development lifecycle, where Inmon is just about the data warehouse (not “how” used)
  • 27. Populating a Data Warehouse  Determine frequency of data pull (daily, weekly, etc)  Full Extraction – All data (usually dimension tables)  Incremental Extraction – Only data changed from last run (fact tables)  How to determine data that has changed  Timestamp - Last Updated  Change Data Capture (CDC)  Partitioning by date  Triggers on tables  MERGE SQL Statement  Column DEFAULT value populated with date  Online Extraction – Data from source. First create copy of source:  Replication  Database Snapshot  Availability Groups  Offline Extraction – Data from flat file
  • 28. ETL vs ELT • Extract, Transform, and Load (ETL) • Transform while hitting source system • No staging tables • Processing done by ETL tools (SSIS) • Extract, Load, Transform (ELT) • Uses staging tables • Processing done by target database engine (SSIS: Execute T-SQL Statement task instead of Data Flow Transform tasks) • Use for big volumes of data • Use when source and target databases are the same • Use with the Analytics Platform System (APS) ELT is better since database engine is more efficient than SSIS • Best use of database engine: Transformations • Best use of SSIS: Data pipeline and workflow management
  • 29. Surrogate Keys Surrogate Keys – Unique identifier not derived from source system • Embedded in fact tables as foreign keys to dimension tables • Allows integrating data from multiple source systems • Protect from source key changes in the source system • Allows for slowly changing dimensions • Allows you to create rows in the dimension that don’t exist in the source (-1 in fact table for unassigned) • Improves performance (joins) and database size by using integer type instead of text • Implemented via identity column on dimension tables
  • 30. SSAS Cubes Reasons to report off cubes instead of the data warehouse:  Aggregating (Summarizing) the data for performance  Multidimensional analysis – slice, dice, drilldown, show details  Can store Hierarchies  Built-in support for KPI’s  Security: You can use the security setting to give end-users access to only those parts (slices) of the cube relevant to them  Built-in advanced time-calculations – i.e. 12-month rolling average  Easily use Excel to view data via Pivot Tables  Automatically handles Slowly Changing Dimensions (SCD)  Required for PerformancePoint, Power View, and SSAS data mining
  • 32. Modern Data Warehouse Think about future needs: • Increasing data volumes • Real-time performance • New data sources and types (Hadoop) • Cloud-born data Solution – Microsoft Analytics Platform System: • Scalable • MPP architecture • HDInsight • Polybase Follow-on presentation: “Building a Big Data Solution (Building an Effective Data Warehouse Architecture with Hadoop, the cloud, and MPP)”
  • 33. Resources  Data Warehouse Architecture – Kimball and Inmon methodologies: http://bit.ly/SrzNHy  SQL Server 2012: Multidimensional vs tabular: http://bit.ly/SrzX1x  Data Warehouse vs Data Mart: http://bit.ly/SrAi4p  Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6  Complex reporting off a SSAS cube: http://bit.ly/SrAEYw  Surrogate Keys: http://bit.ly/SrAIrp  Normalizing Your Database: http://bit.ly/SrAHnc  Difference between ETL and ELT: http://bit.ly/SrAKQa  Microsoft’s Data Warehouse offerings: http://bit.ly/xAZy9h  Microsoft SQL Server Reference Architecture and Appliances: http://bit.ly/y7bXY5  Methods for populating a data warehouse: http://bit.ly/SrARuZ  Great white paper: Microsoft EDW Architecture, Guidance and Deployment Best Practices: http://bit.ly/SrAZug  End-User Microsoft BI Tools – Clearing up the confusion: http://bit.ly/SrBMLT  Microsoft Appliances: http://bit.ly/YQIXzM  Why You Need a Data Warehouse: http://bit.ly/1fwEq0j  Data Warehouse Maturity Model: http://bit.ly/xl4mGM  Operational Data Store (ODS) Defined: http://bit.ly/1H6wnE7  The Modern Data Warehouse: http://bit.ly/1xuX4Py
  • 34. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck will be posted under the “Presentations” tab)

Editor's Notes

  1. You’re a DBA, and your boss asks you to determine if a data warehouse would help the company. So many questions pop into your head: Why use a data warehouse? What’s the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What’s the difference between the Kimball and Inmon methodologies? Does the new Tabular Model in SQL Server 2012 change things? What’s the difference between a data warehouse and a data mart? Is there any hardware I can purchase that is optimized for a data warehouse? What if I have a ton of data? Join this session for the answers to all these questions. You’ll leave with information that will amaze your boss and lead to a big raise… – or at least lead you down the correct path to adding business value to your organization! Company does not have DW and need to understand the benefits and best approach to build one   Company has what they call a DW, but really just a dumping ground for tables from various sources...not much design put into it. Started because a business user wanted to create a report using data from multiple systems and a quick an dirty ETL was created. Grew into a jumbled mess of SP's and SSIS. Needs to be replaced
  2. Fluff, but point is I bring real work experience to the session
  3. http://www.ispot.tv/ad/7f64/directv-hang-gliding
  4. Questions to ask audience: How many use a data warehouse? How many use/know about an appliance or fast track DW? How many use cubes? Who are technical/developers/dba’s, or managers, or BA’s? Removed: End-User Microsoft BI Tools SQL Server 2012 Tabular Model
  5. Started because a business user wanted to create a report using data from multiple systems and a quick an dirty ETL was created. Grew into a jumbled mess of SP's and SSIS. Needs to be replaced
  6. Wayne Eckerson Business value increases as you move along the stages.  Most clients I have seen are between the child and teenager years, with a surprising number still stuck in the infant stage. Stage 1 – Prenatal; Structure: Management Reports; Scope: System; Executive Perception: Cost Center.  This is simply canned static reports that come from the source systems (i.e ERP, CRM, etc).  These reports are usually printed or emailed to a bunch of employees on a regular basis.  If the end-user requests custom reports, it is up to IT to create these reports within the source system, which usually has very limited capabilities for doing so.  And IT usually is backed up in filling these requests, frustrating the users. Stage 2 – Infant; Structure: Spreadmarts; Scope: Individual; Executive Perception: Inform Executives.  Those frustrated users from the previous stage (who are usually business analysts or power users) take matters into their own hands and circumvent IT by extracting data from the source systems and loading it into spreadsheets or desktop databases.  These are like personal, isolated data marts that don’t align with any other data marts.  Because spreadsheets are easy and quick to create, they spread like weeds.  I have seem some companies with thousands of these spreadsheets.  They prevent companies from having accurate and consistent reports, but they are very difficult to eliminate as you can imagine. Stage 3 – Child; Structure: Data Marts; Scope: Department; Executive Perception: Empower Knowledge Workers.  This stage is when departments see the need to give all business users timely data, not just the business analysts and executives.  A data mart is a shared database that usually includes one subject area (finance, sales, etc) that is tailored to meet the needs of one department (see Data Warehouse vs Data Mart).  From the data mart an SSAS cube may be built, and the business user will be given the use of a reporting tool to go against the cube or data mart.  Many times the data mart is in a star schema format.  But data marts fall prey to the same problems that afflict spreadmarts: each data mart has its own definitions and rules and extracts data directly from source systems.  These independent data marts are great at supporting local needs, but their data can’t be aggregated to support cross-departmental analysis. Stage 4 – Teenager; Structure: Data Warehouses; Scope: Division; Executive Perception: Monitor Business Processes.  Data warehouses are a way to integrate data marts without jeopardizing local autonomy.  After a company builds a bunch of data marts, they recognize the need to standardize definitions, rules, and dimensions to prevent integration problems later on.  The most common way to standardize data marts is to create a centralized data warehouse with dependent data marts built from the data warehouse.  This is usually called a hub-and-spoke data warehouse.  A data warehouse encourages deeper levels of analysis because users can now submit queries across multiple subject areas, such as finance and operations, and gain new insights not possible with single-subject data marts. Stage 5 – Adult; Structure: Enterprise DW; Scope: Enterprise; Executive Perception: Drive the Business. Just like the problem of having overlapping and inconsistent data caused by spreadmarts and multiple independent data marts, a large company can have a similar problem with multiple data warehouses.  Many organizations today have multiple data warehouses acquired through internal development, mergers or acquisitions, creating barriers to the free flow of information within and between business groups and the processes they manage.  The solution is to create an enterprise data warehouse (EDW) that will be the single version of the truth.  This EDW serves as an integration machine that continuously consolidates all other analytic structures into itself. Stage 6 – Sage; Structure: BI Services; Scope: Inter-Enterprise; Executive Perception: Drive the Market. At this stage you are extending and integrating the value of a EDW by opening the EDW to customers and suppliers to drive new market opportunities.  Customers and suppliers are provided simple yet powerful interactive reporting tools to compare and benchmark their activity and performance to other groups across a multitude of dimensions.  At the same time, EDW development teams are turning analytical data and BI functionality into Web services that developers — both internal and external to the organization — can leverage with proper authorization. The advent of BI services turns the EDW and its applications into a market-wide utility that can be readily embedded into any application. I would probably add a stage 7 dealing with SQL Server Analysis Services (SSAS) data mining.  SSAS features nine different data mining algorithms that looks for specific types of patterns in trends in order to make predictions about your data. Whether you already exhibit the characteristics of a sage or you’re still trying to hurdle the gulf between the infant and child stages, this maturity model can provide guidance and perspective as you continue your journey.  The model can show you where you are, how far you’ve come and where you need to go.  It provides guideposts to help keep you sane and calm amidst the chaos and strife we contend with each day.
  7. One version of truth story: different departments using different financial formulas to help bonus This leads to reasons to use BI. This is used to convince your boss of need for DW Note that you still want to do some reporting off of source system (i.e. current inventory counts). It’s important to know upfront if data warehouse needs to be updated in real-time or very frequently as that is a major architectural decision JD Edwards has tables names like T117
  8. many sources and many data marts (spaghetti code), different update of frequency, different variation of dimensions
  9. Reference configuration can be built on own or Dell can put it together for you. Once you convinced your boss you need a DW, what is the hardware to place the DW? Instead of just going to the Dell site and configuring your own: use FTDW or Appliances FTDW created via lessons learned with PDW
  10. RAs Some assembly required OR No assembly required (through Partners) FT Training Understand File System layouts How you physically implement DW has to be like the guidelines Appliance MPP Training Queries are different Modeling is different Data structures are different Partitioning is different Key decision points: Data Volumes: If you get above 95 Terabytes Procurement and Operation Logistics: What is your HW SKU process? Can you put a brand new HW system onto the data center? Workload Characteristics: Highly concurrent data? DW Organizational Maturity: Do you already know MPP?
  11. Benefits: Pre-Balanced Architectures Choice of HW platforms Lower TCO High Scale Reduced Risk Taking the guess work out of building a server
  12. Question: How many know what PDW is? HP: A one rack system has 17 servers, 22 processors/132 cores, and 125TB and can be scaled out to a four rack system with 47 servers, 82 processors/492 cores, and 500TB HP Business Decision Appliance (BDA): Made specifically for BI.  HP and Microsoft have delivered the first ever self-service business intelligence appliance, optimized for SQL Server 2008 R2 and SharePoint Server 2010.  Ideal for managed self-service BI with PowerPivot.  Developed for mid-market, enterprise department and remote offices.  The server has 2 CPU’s (12 cores) and 96GB memory.  Configuration of the appliance is integrated into SharePoint.  The Windows Server OS, plus all of the required server components, such as SQL Server and SharePoint, are already loaded on the appliance.  There’s no need to perform any software installations Removed: HP Business Data Warehouse Appliance (FT 3.0, 5TB) HP Business Decision Appliance (BI, SharePoint 2010, SQL Server 2008R2, PP) HP Database Consolidation Appliance (virtual environment, Windows2008R2)
  13. Who knows about these methodologies?
  14. Direct from Kimball: We don't know why people call our approach bottom-up. We spend much time at the beginning choosing the appropriate data sources to answer key business questions, and then after building the BUS matrix showing all the possible business processes (data sources), we then implement those processes that address the most important needs of the business. This is more top-down than anything the CIF does, where they barely mention the need to interview the end users and stand back from the whole project. People like to put Kimball (and Inmon) under convenient labels, but many times these labels are nonsensical. To describe our approach as top-down, or supporting pure analytics just isn't correct. Direct from Inmon: “We have stated - from the very beginning of data warehousing - that the way to build data warehouses is to build them iteratively. In fact we have gone so far to say that the first and foremost critical success factor in the building of a data warehouse is to NOT build the data warehouse using the Big Bang approach. The primary untruth they have told is that it takes a long time and lots of resources to build an Inmon style architecture. ” BDW: Building the Data Warehouse, fourth edition, Inmon, 2005 TTA: A Tale of Two Architectures, Inmon, 2010 Imhoff: Mastering Data Warehouse Design, Imhoff, 2003 Survey: Which Data Warehouse Architecture Is Most Successful? (2006), Ariyachandra, https://cours.etsmtl.ca/mti820/public_docs/lectures/WhichDWArchitectureIsMostSuccessful.pdf
  15. Normalize to eliminate redundant data and setup table relationships Fact tables contain metrics, while dimension tables contain attributes of the metrics in the fact tables
  16. The main difference between the two approaches is that the normalized version is easier to build if the source system is already normalized; but the dimensional version is easier to use for the business users and will generally perform better for analytic queries.
  17. Direct from Inmon: “We have stated - from the very beginning of data warehousing - that the way to build data warehouses is to build them iteratively. In fact we have gone so far to say that the first and foremost critical success factor in the building of a data warehouse is to NOT build the data warehouse using the Big Bang approach. The primary untruth they have told is that it takes a long time and lots of resources to build an Inmon style architecture. ” A common misperception of Inmon’s architecture is that a data warehouse must be built in its entirety first.  He said this is not so.  An Inmon data warehouse can be built over time.  He likened it to the growth of a city – you start out with certain districts and services and as the city grows the architecture of the city grows with it.  You certainly don’t go out to build a complete city overnight; likewise with an enterprise data warehouse. http://www.biprofessional.com/tag/inmon-2/ Page 21 of “Mastering Data Warehouse Design” by Claudia Imhoff: “Nowhere do we recommend that you build an entire data warehouse containing all the strategic enterprise data you will ever need before building the first analytical capability (data mart). Each successive business program solved by another data mart implementation will add the growing set of data serving as the foundation in your data warehouse. Eventually, the amount of data that must be added to the data warehouse to support a new data mart will be negligible because most of it will already be present in the data warehouse.” Hybrid: Data Vault (hub, link, satellite; always uses views; tracks history)
  18. Columnstore removes need for cube This non-persistent staging area caches operational data to be later populated to the warehouse bus. From The Data Warehouse Toolkit, page 9: A normalized database for data staging storage is acceptable. The one negative is now you have the same data in two places: in the staging area and in the data mart If don’t count cubes as “storing” data, then only storing the data once (or zero if using a dimensionalized view). Kimball becomes very much like Inmon’s CIF if you add MDM From Ralph: Generally we don't emphasize the word "data mart" because that invokes Inmon's concept of a data mart that is a custom-built incompatible, aggregated subset of data requested by a specific department, and built from an otherwise inaccessible back room data warehouse based on highly normalized data.   Instead we talk about "business process subject areas" (a mouthful) or just subject areas. Each of these subject areas is ultimately drawn from a separate legacy source and deployed through one or more similar dimensional models (star schemas).   There is nothing in all of this that mandates the data be stored in a single database. No matter where the subject area schemas are stored, in order to combine data from separate subject areas, you must "drill across" using conformed dimensions. The best architecture for implementing drill across is to perform completely separate queries and then combine the results in the BI layer. If you do this, it doesn't matter where the separate subject area databases are located.   We have written extensively about this method of achieving integration across separate data sets, and it is a signature achievement of the dimension approach. One recent Kimball Design Tip that implements these issues with an actual spreadsheet interface is   http://www.kimballgroup.com/2013/06/02/design-tip-156-an-excel-macro-for-drilling-across/
  19. The one negative with this approach is you could have the same data copied in three places: staging area, CIF, data mart. If don’t count cubes as “storing” data, then storing data twice (data warehouse and data marts). Hub-and-spoke architecture DW 2.0 (unstructured data, ODS, CDC)
  20. Hire a BI professional to help you! If you lack expertise and bandwidth Kimball model used because developers knew of nothing else My opinion, not set in stone, based on source size, use cubes Don't lock into one way of doing everything - depends on sources
  21. Only difference between Hybrid model and Inmon is data marts are star schema, not NF For an extremely large number of sources, you can even add another layer to this: a EDW containing all star schemas in between the CIF and DW Bus Architecture. Can have an ODS to act as the staging area for the data warehouse, at least for the data that it maintains. The data in the ODS is more real time than the data required by the data warehouse. Reconcile and cleanse that data, populating the ODS near real time providing value to the operational and tactical decision makers who need it and also use it to populate the data warehouse, honoring the less demanding load cycles typical of the DW. You will still need the staging area for DW required data not hosted in the ODS, but in following this recommendation you economize on your ETL flows and staging area volumes without sacrificing value or function. Benefit of views Can be modified by anyone, even outside of BIDS/SSDT Can provide default values when needed Simple computation can be carried out by views Renaming fields leads to better understanding of the flow Can present a star schema, even if the underlying structure is much more complex Can be analyzed by third party tools to get dependency tracking Can be optimized without ever opening BIDS/SSDT Benefit of views in SSIS: Simpler code inside SSIS packages No need to open the package to understand what it is reading Easily query the database for debugging purposes Query optimizations can be carried out separately Benefit of views in SSAS: Renaming database columns to SSAS attributes Clearly exposing all the transformations to DBA Simplifying handling of fast variations Full control on JOINs sent to SQL Server Exposing a start schema, even if the underlying structure is not a simple star schema Hybrid: 3NF data warehouse feeds dimensional presentation layer the creation of an OLTP Mirror has always been a successful strategy for several reasons:  We use the “real” OLTP for a small amount of time during mirroring. Then we free it, so it can do any kind of database maintenance needed.  The ability to modify the keys, relations and views of the mirror lead to simpler ETL processes that are cheaper both to build and to maintain over time  The ability to build particular indexes in OLTP Mirror could improve performance of other steps of ETL processes, without worrying about index maintenance on the original OLTP  The mirror is normally much smaller than the complete OLTP system as we do not have to load all the tables of the OLTP and, for each table, we can decide to mirror only a subset of the columns
  22. When people say the use the Kimball model, most times they really mean they are using the Kimball Methodology and/or are using dimensional modeling. The word “Kimball” is synonymous with dimensional modeling. Ralph didn’t invent the original basic concepts of facts and dimensions, however, he established an extensive portfolio of dimensional techniques and vocabulary, including conformed dimensions, slowly changing dimensions, junk dimensions, mini-dimensions, bridge tables, periodic and accumulating snapshot fact tables, and the list goes on. Over the nearly 20 years,  Ralph and his Kimball Group colleagues have written hundreds of articles and Design Tips on dimensional modeling, as well as the seminal text, The Data Warehouse Toolkit, Second Edition (John Wiley, 2002). While Ralph led the charge, dimensional modeling is appropriate for organizations who embrace the Kimball architecture, as well as those who follow the Corporate Information Factory (CIF) hub-and-spoke architecture espoused by Bill Inmon and others.
  23. Question: How many people know what surrogate keys are?
  24. Question: How many people know what SSAS cubes are?
  25. Dell Microsoft Analytics Platform System (v2, SQL 2012, 15TB-6PB) HP AppSystem for SQL 2012 Parallel Data Warehouse (v2, SQL 2012, 15TB-6PB) Quanta Microsoft Analytics Platform System (v2, SQL 2012, 15TB-6PB)
  26. Other non-Microsoft: Qlikview, Tableau