Agile Data Warehousing 
From Start to Finish 
Presenter: Davide Mauri, Architect & Mentor, SolidQ 
Moderator: Alex Whittles
Technical Assistance 
2 
If you require assistance 
during the session, type 
your inquiry into the 
question pane on the 
right side. 
Maximize your screen 
with the zoom button 
on the top of the 
presentation window 
Type your questions in 
the question pane on 
the right side
Thank You Sponsors 
Welcome to the Azure family! 
Try DocumentDB today! 
http://documentdb.com 
Solutions from Dell help you 
monitor, manage, protect and 
improve your SQL Server 
environment. 
http://software.dell.com/sql-pass-vc- 
dell-sql-server-solutions
www.PASSSummit.com 
Planning on attending PASS Summit 2014? Start saving 
today! 
• The world’s largest gathering of SQL Server & BI professionals 
• Take your SQL Server skills to the next level by learning from the world’s 
SQL Server experts, in 190+ technical sessions 
• Over 5000 attendees, representing 2000 companies, from 52 countries, 
ready to network & learn 
Use discount code 24HOP14 
to save $200! 
$1,895 
UNTIL SEPTEMBER 26, 
2014
Davide Mauri 
 SolidQ Mentor 
 Board of Directors, SolidQ Italy 
 Microsoft SQL Server MVP 
 Works with managers to build effective, 
tailor-made BI solutions for customers 
@mauridb
Agile Data Warehousing 
From Start to Finish 
Davide Mauri, Architect & Mentor, SolidQ
Agenda 
What is a DWH, really? 
Agile: the only way to succeed 
Engineering the DWH 
ETL Design Patterns 
ETL Automation 
Testing
What is a DWH, really?
The Data-Driven Age
Isn’t the DWH and “old” thing? 
Big Data, In Memory and all the new stuff, can’t just replace 
the Data Warehouse? 
The answer would be “yes”, if a DWH would be a simple 
“container” of data. 
But it’s much more than this.
What is a DWH, really? 
In this new era, data is like water. 
Who will ever drink from 
untested, untrusted, 
uncertified data?
What is a DWH, really? 
Would a manager or a decision maker, take a decision 
based on data of which he doesn’t know the source, the 
integrity and the correctness?
What is a DWH, really? 
The Data Warehouse is the place where managers and 
decision makers will look for 
• Correct 
• Trusted 
• Updated 
Data in order to make a 
conscious decision
What is a DWH, really? 
The answer is now easy:
What is DWH, really? 
A place to store consolidated data coming from the whole 
company 
A place where cleanse, verify and certify data 
A place where historic data is stored 
A place that holds the single version of truth (if there is one!) 
Forms the core of a BI solution 
User friendly Data models, designed to make data analysis 
easier
Modern Data Environment 
Master 
Data 
EDW 
Data Mart 
Big Data 
Unstructured 
Data 
BI Environment 
Analytics Environment 
Structured 
Data Data Scientist 
Decision Maker
Agility: the only way to succeed
EDW: Reality Check 
EDW is the trusted container of all company data 
It cannot be created in “one day” 
It has to grow and evolve with business needs. 
It will never be 100% complete
The story so far
Adapt to Survive 
“50% of requirements change in the first year of a BI 
project” 
Andreas Bitterer, Research VP, Gartner
Agile Principles 
Small design upfront. Prototype. 
Delivery quickly, Deliver frequently. 
Users are part of the development team! 
Feedback is a key part of the success 
They’ll grow with the solution and the solution will grow with them 
Embrace Changes! 
http://agilemanifesto.org/principles.html
Agile Challenges 
Delivery Quickly and Fast 
 Challenge: keep high quality, no matter who’s doing the work 
Embrace Changes 
 Challenge: don’t introduce bugs. Change the smallest part 
possible. Use automatic Testing to preserve and assure data 
quality.
Engineering the DWH
Engineering the solution 
To be Agile, some engineering practices needs to be included in 
our work model 
Agility != Anarchy 
Engineering: 
 Apply well-known models 
 Define, Apply & Enforce rules 
 Automate and/or Check rules application 
 Measure 
 Test 
2
Engineering the solution 
Favor Kimball Approach (for user-facing models) 
 Dimensional Modeling 
 Fact & Measures 
 Dimensions 
Use views to introduce abstraction layers 
 Reduce the “friction” between layers (source / stage / dwh / dm) 
 Apply the “Information Hiding Principle”
Engineering the solution 
Define & Force the application of well-known ETL patterns 
 SCD1 / SCD2 
 Incremental / Partition Load 
Divide Et Impera 
 At least two SSIS solutions 
 many small SSIS Packages 
 5 Databases (STG, CFG, LOG, MD, DWH)
Design Pattern 
“A general reusable solution 
to a commonly occurring 
problem within a given 
context”
Design Pattern 
Generic ETL Pattern 
 Partition Load 
 Incremental/Differential Load 
Generic DWH/BI Design Pattern 
 Slowly Changing Dimension 
 SCD1, SCD2, ecc. 
 Fact Table 
 Transactional, Snapshot, Temporal Snapshot
Design Pattern 
Specific SQL Server Patterns 
 Change Data Capture 
 Change Tracking 
 Partition Load 
 SSIS Parallelism
ETL Automation
No Monkey Work! 
Let the people think and let 
the machines do the 
«monkey» work.
Invest on Automation? 
Faster development 
 Reduce Costs 
 Embrace Changes 
Less bugs 
Increase solution quality and make it consistent throughout 
the whole product
Hi-Level Vision 
Technical Process 
Technical Process 
ETL ETL 
STG 
OLTP DWH 
ETL 
Business Process
ETL Phases 
«E» and «L» must be 
 Simple, Easy and Straightforward 
 Completely Automated 
 Completely Reusable 
«E» and «L» have ZERO value in a DWH Solution 
 Should be done in the most economic way
Automation Tools 
PowerShell / .NET 
 Supported by SMO & SSIS API 
 Microsoft creates platforms not only products! 
BIML – BI Markup Language 
 From Varigence 
 Free with BIDS Helper 
 Full support with MIST
Metadata 
Metadata is needed in order to make automation a 
repeatable process 
 Source to Staging Info 
 Staging to DWH info 
 Dimension Keys 
 Dimension & Fact Table relationship 
Extended Properties + SQL Server DMVs help to maintain 
metadata coherent
Unit Testing
Unit Testing 
Data MUST be tested. 
It’s like water, remember? 
If trust is lost, DWH is an 
#epicfail
Unit Testing 
Before releasing anything data in the DW must be tested. 
User has to validate a sample of data 
 (e.g.:total invoice amount of January 2012) 
That validated value will become the reference value 
Before release, the same query will be executed again. 
 If the data is the expected reference data then test is green 
 otherwise the test fails
Unit Testing 
Of course test MUST be automated when possible 
 Visual Studio 
 NUnit extensions 
 NBI 
 BI.Quality 
What to test? 
 Aggregated results 
 Specific values of some «special» rule 
 Fixed bugs/tickets 
4
The perfect BI process & architecture 
Iterative!
Questions?
Like What You Heard? 
Davide will be presenting at PASS Summit 2014! 
 PreConference: 
 Agile Data Warehousing: Start to Finish 
 General Session: 
 Agile BI: Unit Testing and Continuos Integration 
Use discount code 24HOP14 
to save $200! 
@mauridb
Coming up next … 
DAX Formulas in Action 
Alberto Ferrari
Thank You for Attending

Agile Data Warehousing

  • 1.
    Agile Data Warehousing From Start to Finish Presenter: Davide Mauri, Architect & Mentor, SolidQ Moderator: Alex Whittles
  • 2.
    Technical Assistance 2 If you require assistance during the session, type your inquiry into the question pane on the right side. Maximize your screen with the zoom button on the top of the presentation window Type your questions in the question pane on the right side
  • 3.
    Thank You Sponsors Welcome to the Azure family! Try DocumentDB today! http://documentdb.com Solutions from Dell help you monitor, manage, protect and improve your SQL Server environment. http://software.dell.com/sql-pass-vc- dell-sql-server-solutions
  • 4.
    www.PASSSummit.com Planning onattending PASS Summit 2014? Start saving today! • The world’s largest gathering of SQL Server & BI professionals • Take your SQL Server skills to the next level by learning from the world’s SQL Server experts, in 190+ technical sessions • Over 5000 attendees, representing 2000 companies, from 52 countries, ready to network & learn Use discount code 24HOP14 to save $200! $1,895 UNTIL SEPTEMBER 26, 2014
  • 5.
    Davide Mauri SolidQ Mentor  Board of Directors, SolidQ Italy  Microsoft SQL Server MVP  Works with managers to build effective, tailor-made BI solutions for customers @mauridb
  • 6.
    Agile Data Warehousing From Start to Finish Davide Mauri, Architect & Mentor, SolidQ
  • 7.
    Agenda What isa DWH, really? Agile: the only way to succeed Engineering the DWH ETL Design Patterns ETL Automation Testing
  • 8.
    What is aDWH, really?
  • 9.
  • 10.
    Isn’t the DWHand “old” thing? Big Data, In Memory and all the new stuff, can’t just replace the Data Warehouse? The answer would be “yes”, if a DWH would be a simple “container” of data. But it’s much more than this.
  • 11.
    What is aDWH, really? In this new era, data is like water. Who will ever drink from untested, untrusted, uncertified data?
  • 12.
    What is aDWH, really? Would a manager or a decision maker, take a decision based on data of which he doesn’t know the source, the integrity and the correctness?
  • 13.
    What is aDWH, really? The Data Warehouse is the place where managers and decision makers will look for • Correct • Trusted • Updated Data in order to make a conscious decision
  • 14.
    What is aDWH, really? The answer is now easy:
  • 15.
    What is DWH,really? A place to store consolidated data coming from the whole company A place where cleanse, verify and certify data A place where historic data is stored A place that holds the single version of truth (if there is one!) Forms the core of a BI solution User friendly Data models, designed to make data analysis easier
  • 16.
    Modern Data Environment Master Data EDW Data Mart Big Data Unstructured Data BI Environment Analytics Environment Structured Data Data Scientist Decision Maker
  • 17.
    Agility: the onlyway to succeed
  • 18.
    EDW: Reality Check EDW is the trusted container of all company data It cannot be created in “one day” It has to grow and evolve with business needs. It will never be 100% complete
  • 19.
  • 20.
    Adapt to Survive “50% of requirements change in the first year of a BI project” Andreas Bitterer, Research VP, Gartner
  • 21.
    Agile Principles Smalldesign upfront. Prototype. Delivery quickly, Deliver frequently. Users are part of the development team! Feedback is a key part of the success They’ll grow with the solution and the solution will grow with them Embrace Changes! http://agilemanifesto.org/principles.html
  • 22.
    Agile Challenges DeliveryQuickly and Fast  Challenge: keep high quality, no matter who’s doing the work Embrace Changes  Challenge: don’t introduce bugs. Change the smallest part possible. Use automatic Testing to preserve and assure data quality.
  • 23.
  • 24.
    Engineering the solution To be Agile, some engineering practices needs to be included in our work model Agility != Anarchy Engineering:  Apply well-known models  Define, Apply & Enforce rules  Automate and/or Check rules application  Measure  Test 2
  • 25.
    Engineering the solution Favor Kimball Approach (for user-facing models)  Dimensional Modeling  Fact & Measures  Dimensions Use views to introduce abstraction layers  Reduce the “friction” between layers (source / stage / dwh / dm)  Apply the “Information Hiding Principle”
  • 26.
    Engineering the solution Define & Force the application of well-known ETL patterns  SCD1 / SCD2  Incremental / Partition Load Divide Et Impera  At least two SSIS solutions  many small SSIS Packages  5 Databases (STG, CFG, LOG, MD, DWH)
  • 27.
    Design Pattern “Ageneral reusable solution to a commonly occurring problem within a given context”
  • 28.
    Design Pattern GenericETL Pattern  Partition Load  Incremental/Differential Load Generic DWH/BI Design Pattern  Slowly Changing Dimension  SCD1, SCD2, ecc.  Fact Table  Transactional, Snapshot, Temporal Snapshot
  • 29.
    Design Pattern SpecificSQL Server Patterns  Change Data Capture  Change Tracking  Partition Load  SSIS Parallelism
  • 30.
  • 31.
    No Monkey Work! Let the people think and let the machines do the «monkey» work.
  • 32.
    Invest on Automation? Faster development  Reduce Costs  Embrace Changes Less bugs Increase solution quality and make it consistent throughout the whole product
  • 33.
    Hi-Level Vision TechnicalProcess Technical Process ETL ETL STG OLTP DWH ETL Business Process
  • 34.
    ETL Phases «E»and «L» must be  Simple, Easy and Straightforward  Completely Automated  Completely Reusable «E» and «L» have ZERO value in a DWH Solution  Should be done in the most economic way
  • 35.
    Automation Tools PowerShell/ .NET  Supported by SMO & SSIS API  Microsoft creates platforms not only products! BIML – BI Markup Language  From Varigence  Free with BIDS Helper  Full support with MIST
  • 36.
    Metadata Metadata isneeded in order to make automation a repeatable process  Source to Staging Info  Staging to DWH info  Dimension Keys  Dimension & Fact Table relationship Extended Properties + SQL Server DMVs help to maintain metadata coherent
  • 37.
  • 38.
    Unit Testing DataMUST be tested. It’s like water, remember? If trust is lost, DWH is an #epicfail
  • 39.
    Unit Testing Beforereleasing anything data in the DW must be tested. User has to validate a sample of data  (e.g.:total invoice amount of January 2012) That validated value will become the reference value Before release, the same query will be executed again.  If the data is the expected reference data then test is green  otherwise the test fails
  • 40.
    Unit Testing Ofcourse test MUST be automated when possible  Visual Studio  NUnit extensions  NBI  BI.Quality What to test?  Aggregated results  Specific values of some «special» rule  Fixed bugs/tickets 4
  • 41.
    The perfect BIprocess & architecture Iterative!
  • 42.
  • 43.
    Like What YouHeard? Davide will be presenting at PASS Summit 2014!  PreConference:  Agile Data Warehousing: Start to Finish  General Session:  Agile BI: Unit Testing and Continuos Integration Use discount code 24HOP14 to save $200! @mauridb
  • 44.
    Coming up next… DAX Formulas in Action Alberto Ferrari
  • 45.
    Thank You forAttending

Editor's Notes

  • #2 Welcome to 24 hours of PASS: Summit Preview! We’re excited you could join us today for Davide Mauri’s session, Agile Data Warehousing: Start to Finish. This 24 Hours of PASS event consists of 24 consecutive live webcasts. Sessions will be recorded and posted online soon after the event. My name is Alex Whittles [add brief intro about yourself] and I have a few quick introduction slides before I hand over the reigns to Davide. He will speak for 40-45 minutes and then we’ll move on to the Q&A where you can ask any questions you may have. [move to next slide]
  • #3 If you’re having any issues, type your issue into the question pane and someone will assist you. To maximize your screen, use the zoom button located on the top of the presentation window. Feel free to enter your questions in the Q&A field at any time. The questions pane is located on the right side of your screen. Once we get to the Q&A portion of the session, I’ll read off your questions to the speaker. Note that there will be a short evaluation at the end of the session, your feedback is important to us so please take a moment to complete it. It will show up on your screen. [Note to moderators: You need to determine which questions are the most relevant and ask them out loud to the presenter].
  • #4 I’d like to take a moment to thank our event partners. The staging of 24 Hours of PASS would not be possible without their support and dedication, they are the reason this event is available free of charge. Thank you to our Presenting Sponsors: Microsoft and Dell Software. Move to next slide
  • #5 Next, as you all may now, this 24 Hours of PASS is a preview of PASS Summit 2014, the largest conference for SQL Server and BI professionals. With over 5000 attendees representing 2000 companies, from 52 countries, Summit is a time to share, connect and learn with your peers and industry partners. PASS Summit is not only a week of intensive learning and knowledge sharing that’ll offer strategic insights, it’s a time to network and rub shoulders with industry experts. Taking place in Seattle, WA from November 4-7, PASS Summit will feature over 190 world-class sessions across 5 topic tracks. These 24 Hours of PASS sessions provide a mere glimpse of what you can expect from PASS Summit. Find out more at www.passSummit.com and if you register by September 26 using discount code 24HOP14, you’ll get $200 off the registration fee. [move to next slide]
  • #6 And now, please allow me to present the speakers of the hour: Davide Mauri [move to next slide, speaker’s presentation]
  • #28 http://en.wikipedia.org/wiki/Software_design_pattern
  • #29 http://en.wikipedia.org/wiki/Software_design_pattern
  • #30 http://en.wikipedia.org/wiki/Software_design_pattern
  • #33 http://chartporn.org/2012/05/10/repetitive-tasks/
  • #44 Like what you heard here? Davide will be presenting at PASS Summit 2014: catch Davide in his general session, Agile BI: Unit Testing and Continuos Integration and the full presentation of this PreConference, Agile Data Warehousing: Start to Finish at PASS Summit 2014. And don’t forget to use the discount code 24HOP14 to save $200 on PASS Summit registration.
  • #45 Stay tuned for our next session, DAX Formulas in Action with Alberto Ferrari, happening in a couple of minutes.