Scott Weinstein<br />Principal<br />Lab49<br />weblogs.asp.net/sweinstein   / @ScottWeinstein<br />Using PowerShell to sim...
GOLD SPONSORS<br />
SILVER SPONSOR<br />
BRONZE SPONSORS<br />
OTHER SPONSORS<br />
SSIS Considered Harmful <br />Visual programming is a terrible interface for ETL work<br />History of UI visual designers ...
But what about all the features?<br />Some are trivial:<br />Foreach Loop Container<br />Execute SQL Task<br />Send Mail T...
But what about performance<br />The advantages for data flow are real, but not that large<br />A 5M row test on my home ma...
The Answer<br />PowerShell<br />A general purpose scripting and automation language<br />Wide adoption<br />MS: Windows Se...
PowerShell Features<br />Takes the best of Unix, VMS, Perl<br />Adds in integration with .Net, COM, WMI<br />Adds in <br /...
Introducing PSIS<br />http://psis.codeplex.com/<br />Yes, the name is derivative<br />Right now, just a playground for som...
PSIS demo<br />Just the ETL<br />No logging, error reporting, etc<br />Some typical ETL tasks<br />Clear staging tables<br...
Clear staging tables<br />Use Invoke-MSSqlCommand<br />Use Truncate <tablename><br />
Bulk transfer<br />At core, we’ll use .Net’sSqlBulkCopy<br />under the covers uses TDS’s BCP protocol<br />For  concurrenc...
Populate a Star-Schema<br />Destination<br />
Populate a Star-Schema<br />Source a view on the staging tables, creating a flat de-normalized row of all dimension and fa...
Populate a Star-Schema<br />Code generate a populator<br />Create functions, given a source row which return/create the co...
Populate a Star-Schema<br />Areas of improvement<br />Add support for slowly changing dimensions<br />Add support for lowe...
Extract to file<br />Invoke-MSSqlCommand<br />Export-Csv<br />
Others?<br />
Upcoming SlideShare
Loading in...5
×

Using PowerShell to Simplify your ETL

4,029

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,029
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using PowerShell to Simplify your ETL

  1. 1. Scott Weinstein<br />Principal<br />Lab49<br />weblogs.asp.net/sweinstein / @ScottWeinstein<br />Using PowerShell to simplify your ETL<br />
  2. 2. GOLD SPONSORS<br />
  3. 3. SILVER SPONSOR<br />
  4. 4. BRONZE SPONSORS<br />
  5. 5. OTHER SPONSORS<br />
  6. 6. SSIS Considered Harmful <br />Visual programming is a terrible interface for ETL work<br />History of UI visual designers is instructive here<br />No way to copy/share “code samples”<br />Low information density<br />Even worse<br />It’s not even a good designer<br />Poor defaults, poor layout<br />Nested menus of property grids are hard to use<br />No real VCS ability<br />Sure it’s XML, but…<br />No diff<br />No merge<br />Demo ?<br />
  7. 7. But what about all the features?<br />Some are trivial:<br />Foreach Loop Container<br />Execute SQL Task<br />Send Mail Task<br />OMG!<br />Many are useful, but not hard to recreate<br />Demos soon<br />Others may be worth consideration<br />Fuzzy matching<br />
  8. 8. But what about performance<br />The advantages for data flow are real, but not that large<br />A 5M row test on my home machine gave a 12% performance advantage to SSIS over PowerShell + SqlBulkCopy<br />
  9. 9. The Answer<br />PowerShell<br />A general purpose scripting and automation language<br />Wide adoption<br />MS: Windows Server, Windows 7, SQL Server 2008, VM Manager, IIS, Deployment Toolkit<br />Others: VMWare, Quest, IBM Websphere<br />
  10. 10. PowerShell Features<br />Takes the best of Unix, VMS, Perl<br />Adds in integration with .Net, COM, WMI<br />Adds in <br />REPL environment<br />Full scripting/programming<br />Flexible typing model<br />Pipes<br />Regular expressions<br />XML<br />Navigation Providers<br />Built in command parser<br />Standardized syntax for commands<br />
  11. 11. Introducing PSIS<br />http://psis.codeplex.com/<br />Yes, the name is derivative<br />Right now, just a playground for some demos<br />Bulk Data Transfer<br />Star-Schema Populator<br />Execute SQL<br />
  12. 12. PSIS demo<br />Just the ETL<br />No logging, error reporting, etc<br />Some typical ETL tasks<br />Clear staging tables<br />Bulk transfer data from one DB to another<br />Populate a star schema<br />Fact table and dimension tables<br />Extract data to a .csv file<br />
  13. 13. Clear staging tables<br />Use Invoke-MSSqlCommand<br />Use Truncate <tablename><br />
  14. 14. Bulk transfer<br />At core, we’ll use .Net’sSqlBulkCopy<br />under the covers uses TDS’s BCP protocol<br />For concurrency we’ll use the Task Parallel Library<br />We’ll simplify things by making the destination tables map directly to the source tables<br />Select statements could provide needed mapping<br />
  15. 15. Populate a Star-Schema<br />Destination<br />
  16. 16. Populate a Star-Schema<br />Source a view on the staging tables, creating a flat de-normalized row of all dimension and fact value<br />Use convention to indicate which columns are fact or dimension tables<br />
  17. 17. Populate a Star-Schema<br />Code generate a populator<br />Create functions, given a source row which return/create the corresponding dimension key<br />Run it with the TPL for performance<br />
  18. 18. Populate a Star-Schema<br />Areas of improvement<br />Add support for slowly changing dimensions<br />Add support for lower memory requirements <br />query DB for dimension values on demand<br />Measure and tune performance at the micro-benchmark level<br />Concurrent dictionaries<br />Partial loads<br />
  19. 19. Extract to file<br />Invoke-MSSqlCommand<br />Export-Csv<br />
  20. 20. Others?<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×